hunch_net hunch_net-2006 hunch_net-2006-211 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
sentIndex sentText sentNum sentScore
1 Netflix is running a contest to improve recommender prediction systems. [sent-1, score-1.007]
2 A 10% improvement over their current system yields a $1M prize. [sent-2, score-0.832]
3 Failing that, the best smaller improvement yields a smaller $50K prize. [sent-3, score-1.19]
4 This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. [sent-4, score-1.488]
5 The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets. [sent-5, score-1.845]
wordName wordTfidf (topN-words)
[('contest', 0.496), ('yields', 0.303), ('smaller', 0.278), ('improvement', 0.263), ('failing', 0.209), ('orders', 0.199), ('achievable', 0.199), ('recommender', 0.192), ('apparently', 0.174), ('recommendation', 0.169), ('magnitude', 0.158), ('prize', 0.158), ('system', 0.155), ('netflix', 0.155), ('money', 0.146), ('looks', 0.144), ('improve', 0.126), ('datasets', 0.123), ('public', 0.122), ('dataset', 0.122), ('certainly', 0.12), ('running', 0.119), ('current', 0.111), ('comes', 0.104), ('larger', 0.102), ('almost', 0.087), ('prediction', 0.074), ('bit', 0.072), ('real', 0.07), ('best', 0.068), ('quite', 0.066), ('also', 0.044)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 211 hunch net-2006-10-02-$1M Netflix prediction contest
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
2 0.30039462 371 hunch net-2009-09-21-Netflix finishes (and starts)
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
3 0.20146739 430 hunch net-2011-04-11-The Heritage Health Prize
Introduction: The Heritage Health Prize is potentially the largest prediction prize yet at $3M, which is sure to get many people interested. Several elements of the competition may be worth discussing. The most straightforward way for HPN to deploy this predictor is in determining who to cover with insurance. This might easily cover the costs of running the contest itself, but the value to the health system of a whole is minimal, as people not covered still exist. While HPN itself is a provider network, they have active relationships with a number of insurance companies, and the right to resell any entrant. It’s worth keeping in mind that the research and development may nevertheless end up being useful in the longer term, especially as entrants also keep the right to their code. The judging metric is something I haven’t seen previously. If a patient has probability 0.5 of being in the hospital 0 days and probability 0.5 of being in the hospital ~53.6 days, the optimal prediction in e
4 0.18022154 129 hunch net-2005-11-07-Prediction Competitions
Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the
5 0.17979431 427 hunch net-2011-03-20-KDD Cup 2011
Introduction: Yehuda points out KDD-Cup 2011 which Markus and Gideon helped setup. This is a prediction and recommendation contest for music. In addition to being a fun chance to show your expertise, there are cash prizes of $5K/$2K/$1K.
6 0.14593488 362 hunch net-2009-06-26-Netflix nearly done
7 0.12807204 369 hunch net-2009-08-27-New York Area Machine Learning Events
8 0.11533953 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
9 0.10558224 19 hunch net-2005-02-14-Clever Methods of Overfitting
10 0.099434443 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
11 0.092481837 92 hunch net-2005-07-11-AAAI blog
12 0.090444826 336 hunch net-2009-01-19-Netflix prize within epsilon
13 0.087080821 239 hunch net-2007-04-18-$50K Spock Challenge
14 0.080913201 275 hunch net-2007-11-29-The Netflix Crack
15 0.078074567 399 hunch net-2010-05-20-Google Predict
16 0.066174187 91 hunch net-2005-07-10-Thinking the Unthought
17 0.066091284 270 hunch net-2007-11-02-The Machine Learning Award goes to …
18 0.064061999 382 hunch net-2009-12-09-Future Publication Models @ NIPS
19 0.06120494 364 hunch net-2009-07-11-Interesting papers at KDD
20 0.060189217 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
topicId topicWeight
[(0, 0.097), (1, 0.004), (2, -0.044), (3, -0.013), (4, -0.084), (5, 0.046), (6, -0.173), (7, -0.011), (8, -0.047), (9, -0.062), (10, -0.139), (11, 0.329), (12, -0.005), (13, 0.023), (14, -0.039), (15, 0.011), (16, 0.028), (17, -0.021), (18, 0.04), (19, -0.03), (20, -0.134), (21, -0.094), (22, -0.01), (23, -0.032), (24, 0.006), (25, -0.082), (26, -0.084), (27, -0.108), (28, 0.026), (29, 0.001), (30, 0.104), (31, -0.033), (32, 0.101), (33, 0.051), (34, 0.064), (35, -0.022), (36, -0.054), (37, -0.036), (38, -0.105), (39, 0.07), (40, -0.097), (41, 0.031), (42, 0.012), (43, 0.07), (44, 0.105), (45, -0.065), (46, -0.08), (47, 0.006), (48, -0.053), (49, 0.035)]
simIndex simValue blogId blogTitle
same-blog 1 0.99615335 211 hunch net-2006-10-02-$1M Netflix prediction contest
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
2 0.79602057 427 hunch net-2011-03-20-KDD Cup 2011
Introduction: Yehuda points out KDD-Cup 2011 which Markus and Gideon helped setup. This is a prediction and recommendation contest for music. In addition to being a fun chance to show your expertise, there are cash prizes of $5K/$2K/$1K.
3 0.7308901 362 hunch net-2009-06-26-Netflix nearly done
Introduction: A $1M qualifying result was achieved on the public Netflix test set by a 3-way ensemble team . This is just in time for Yehuda ‘s presentation at KDD , which I’m sure will be one of the best attended ever. This isn’t quite over—there are a few days for another super-conglomerate team to come together and there is some small chance that the performance is nonrepresentative of the final test set, but I expect not. Regardless of the final outcome, the biggest lesson for ML from the Netflix contest has been the formidable performance edge of ensemble methods.
4 0.68579233 371 hunch net-2009-09-21-Netflix finishes (and starts)
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
5 0.6588456 430 hunch net-2011-04-11-The Heritage Health Prize
Introduction: The Heritage Health Prize is potentially the largest prediction prize yet at $3M, which is sure to get many people interested. Several elements of the competition may be worth discussing. The most straightforward way for HPN to deploy this predictor is in determining who to cover with insurance. This might easily cover the costs of running the contest itself, but the value to the health system of a whole is minimal, as people not covered still exist. While HPN itself is a provider network, they have active relationships with a number of insurance companies, and the right to resell any entrant. It’s worth keeping in mind that the research and development may nevertheless end up being useful in the longer term, especially as entrants also keep the right to their code. The judging metric is something I haven’t seen previously. If a patient has probability 0.5 of being in the hospital 0 days and probability 0.5 of being in the hospital ~53.6 days, the optimal prediction in e
6 0.5383361 129 hunch net-2005-11-07-Prediction Competitions
7 0.53830737 275 hunch net-2007-11-29-The Netflix Crack
8 0.41648433 418 hunch net-2010-12-02-Traffic Prediction Problem
9 0.39887425 336 hunch net-2009-01-19-Netflix prize within epsilon
10 0.39063033 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
11 0.38667291 284 hunch net-2008-01-18-Datasets
12 0.38176253 19 hunch net-2005-02-14-Clever Methods of Overfitting
13 0.35007116 119 hunch net-2005-10-08-We have a winner
14 0.32640696 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
15 0.32356438 239 hunch net-2007-04-18-$50K Spock Challenge
16 0.31460905 399 hunch net-2010-05-20-Google Predict
17 0.28805712 364 hunch net-2009-07-11-Interesting papers at KDD
18 0.28101438 446 hunch net-2011-10-03-Monday announcements
19 0.2734206 259 hunch net-2007-08-19-Choice of Metrics
20 0.2625736 377 hunch net-2009-11-09-NYAS ML Symposium this year.
topicId topicWeight
[(27, 0.139), (38, 0.093), (55, 0.157), (98, 0.449)]
simIndex simValue blogId blogTitle
same-blog 1 0.87843966 211 hunch net-2006-10-02-$1M Netflix prediction contest
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
2 0.84241569 167 hunch net-2006-03-27-Gradients everywhere
Introduction: One of the basic observations from the atomic learning workshop is that gradient-based optimization is pervasive. For example, at least 7 (of 12) speakers used the word ‘gradient’ in their talk and several others may be approximating a gradient. The essential useful quality of a gradient is that it decouples local updates from global optimization. Restated: Given a gradient, we can determine how to change individual parameters of the system so as to improve overall performance. It’s easy to feel depressed about this and think “nothing has happened”, but that appears untrue. Many of the talks were about clever techniques for computing gradients where your calculus textbook breaks down. Sometimes there are clever approximations of the gradient. ( Simon Osindero ) Sometimes we can compute constrained gradients via iterated gradient/project steps. ( Ben Taskar ) Sometimes we can compute gradients anyways over mildly nondifferentiable functions. ( Drew Bagnell ) Even give
3 0.83967352 322 hunch net-2008-10-20-New York’s ML Day
Introduction: I’m not as naturally exuberant as Muthu 2 or David about CS/Econ day, but I believe it and ML day were certainly successful. At the CS/Econ day, I particularly enjoyed Toumas Sandholm’s talk which showed a commanding depth of understanding and application in automated auctions. For the machine learning day, I enjoyed several talks and posters (I better, I helped pick them.). What stood out to me was number of people attending: 158 registered, a level qualifying as “scramble to find seats”. My rule of thumb for workshops/conferences is that the number of attendees is often something like the number of submissions. That isn’t the case here, where there were just 4 invited speakers and 30-or-so posters. Presumably, the difference is due to a critical mass of Machine Learning interested people in the area and the ease of their attendance. Are there other areas where a local Machine Learning day would fly? It’s easy to imagine something working out in the San Franci
4 0.7623688 231 hunch net-2007-02-10-Best Practices for Collaboration
Introduction: Many people, especially students, haven’t had an opportunity to collaborate with other researchers. Collaboration, especially with remote people can be tricky. Here are some observations of what has worked for me on collaborations involving a few people. Travel and Discuss Almost all collaborations start with in-person discussion. This implies that travel is often necessary. We can hope that in the future we’ll have better systems for starting collaborations remotely (such as blogs), but we aren’t quite there yet. Enable your collaborator . A collaboration can fall apart because one collaborator disables another. This sounds stupid (and it is), but it’s far easier than you might think. Avoid Duplication . Discovering that you and a collaborator have been editing the same thing and now need to waste time reconciling changes is annoying. The best way to avoid this to be explicit about who has write permission to what. Most of the time, a write lock is held for the e
5 0.67188245 111 hunch net-2005-09-12-Fast Gradient Descent
Introduction: Nic Schaudolph has been developing a fast gradient descent algorithm called Stochastic Meta-Descent (SMD). Gradient descent is currently untrendy in the machine learning community, but there remains a large number of people using gradient descent on neural networks or other architectures from when it was trendy in the early 1990s. There are three problems with gradient descent. Gradient descent does not necessarily produce easily reproduced results. Typical algorithms start with “set the initial parameters to small random values”. The design of the representation that gradient descent is applied to is often nontrivial. In particular, knowing exactly how to build a large neural network so that it will perform well requires knowledge which has not been made easily applicable. Gradient descent can be slow. Obviously, taking infinitesimal steps in the direction of the gradient would take forever, so some finite step size must be used. What exactly this step size should be
6 0.49292332 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)
7 0.41839796 264 hunch net-2007-09-30-NIPS workshops are out.
8 0.41345882 395 hunch net-2010-04-26-Compassionate Reviewing
9 0.41243234 270 hunch net-2007-11-02-The Machine Learning Award goes to …
10 0.40621686 452 hunch net-2012-01-04-Why ICML? and the summer conferences
11 0.40331233 453 hunch net-2012-01-28-Why COLT?
12 0.40229467 90 hunch net-2005-07-07-The Limits of Learning Theory
13 0.39262637 65 hunch net-2005-05-02-Reviewing techniques for conferences
14 0.39247149 375 hunch net-2009-10-26-NIPS workshops
15 0.39192697 454 hunch net-2012-01-30-ICML Posters and Scope
16 0.38991654 89 hunch net-2005-07-04-The Health of COLT
17 0.38885778 484 hunch net-2013-06-16-Representative Reviewing
18 0.38825697 40 hunch net-2005-03-13-Avoiding Bad Reviewing
19 0.38815224 437 hunch net-2011-07-10-ICML 2011 and the future
20 0.38687459 448 hunch net-2011-10-24-2011 ML symposium and the bears