hunch_net hunch_net-2007 hunch_net-2007-239 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
sentIndex sentText sentNum sentScore
1 Apparently, the company Spock is setting up a $50k entity resolution challenge . [sent-1, score-1.181]
2 $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . [sent-2, score-0.396]
3 It’s also nice that the Spock challenge has a short duration. [sent-3, score-0.727]
4 The (visible) test set is of size 25k and the training set has size 75k. [sent-4, score-1.001]
wordName wordTfidf (topN-words)
[('spock', 0.521), ('challenge', 0.407), ('netflix', 0.337), ('entity', 0.26), ('size', 0.25), ('resolution', 0.241), ('visible', 0.241), ('apparently', 0.189), ('company', 0.176), ('nice', 0.149), ('set', 0.137), ('effectively', 0.136), ('short', 0.124), ('someone', 0.12), ('training', 0.118), ('test', 0.109), ('setting', 0.097), ('less', 0.087), ('much', 0.053), ('also', 0.047)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 239 hunch net-2007-04-18-$50K Spock Challenge
Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
2 0.47053963 291 hunch net-2008-03-07-Spock Challenge Winners
Introduction: The spock challenge for named entity recognition was won by Berno Stein , Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast .
3 0.14873905 275 hunch net-2007-11-29-The Netflix Crack
Introduction: A couple security researchers claim to have cracked the netflix dataset . The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a small fraction of Netflix users. The basic idea is to use a heuristic similarity function between ratings in a public database (from IMDB) and an anonymized database (Netflix) to link ratings in the private database to public identities (in IMDB). They claim to have linked two of a few dozen IMDB users to anonymized netflix users. The claims seem a bit inflated to me, because (a) knowing the IMDB identity isn’t equivalent to knowing the person and (b) the claims of statistical significance are with respect to a model of the world they created (rather than one they created). Overall, this is another example showing that complete privacy is hard . It may be worth remembering that there are some substantial benefits from the Netf
4 0.13622259 362 hunch net-2009-06-26-Netflix nearly done
Introduction: A $1M qualifying result was achieved on the public Netflix test set by a 3-way ensemble team . This is just in time for Yehuda ‘s presentation at KDD , which I’m sure will be one of the best attended ever. This isn’t quite over—there are a few days for another super-conglomerate team to come together and there is some small chance that the performance is nonrepresentative of the final test set, but I expect not. Regardless of the final outcome, the biggest lesson for ML from the Netflix contest has been the formidable performance edge of ensemble methods.
5 0.1020698 129 hunch net-2005-11-07-Prediction Competitions
Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the
6 0.10083219 418 hunch net-2010-12-02-Traffic Prediction Problem
7 0.097367957 26 hunch net-2005-02-21-Problem: Cross Validation
8 0.09385471 371 hunch net-2009-09-21-Netflix finishes (and starts)
9 0.087412022 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
10 0.087080821 211 hunch net-2006-10-02-$1M Netflix prediction contest
11 0.08293999 377 hunch net-2009-11-09-NYAS ML Symposium this year.
12 0.08267232 389 hunch net-2010-02-26-Yahoo! ML events
13 0.078599021 94 hunch net-2005-07-13-Text Entailment at AAAI
14 0.070283964 47 hunch net-2005-03-28-Open Problems for Colt
15 0.064474121 259 hunch net-2007-08-19-Choice of Metrics
16 0.059946842 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
17 0.057995018 131 hunch net-2005-11-16-The Everything Ensemble Edge
18 0.056026131 171 hunch net-2006-04-09-Progress in Machine Translation
19 0.053540837 111 hunch net-2005-09-12-Fast Gradient Descent
20 0.053397413 403 hunch net-2010-07-18-ICML & COLT 2010
topicId topicWeight
[(0, 0.069), (1, 0.03), (2, -0.022), (3, -0.006), (4, -0.01), (5, 0.011), (6, -0.088), (7, 0.013), (8, -0.037), (9, -0.033), (10, -0.026), (11, 0.282), (12, -0.039), (13, -0.02), (14, 0.008), (15, 0.012), (16, 0.014), (17, 0.01), (18, 0.018), (19, 0.026), (20, -0.192), (21, -0.032), (22, -0.053), (23, -0.092), (24, 0.057), (25, -0.068), (26, -0.025), (27, -0.041), (28, -0.013), (29, 0.056), (30, 0.121), (31, 0.147), (32, 0.063), (33, 0.099), (34, -0.091), (35, -0.05), (36, -0.012), (37, 0.084), (38, -0.049), (39, 0.022), (40, 0.285), (41, -0.119), (42, -0.061), (43, -0.183), (44, -0.157), (45, 0.02), (46, 0.197), (47, -0.108), (48, -0.025), (49, 0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.99208289 239 hunch net-2007-04-18-$50K Spock Challenge
Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
2 0.93399978 291 hunch net-2008-03-07-Spock Challenge Winners
Introduction: The spock challenge for named entity recognition was won by Berno Stein , Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast .
3 0.57505256 275 hunch net-2007-11-29-The Netflix Crack
Introduction: A couple security researchers claim to have cracked the netflix dataset . The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a small fraction of Netflix users. The basic idea is to use a heuristic similarity function between ratings in a public database (from IMDB) and an anonymized database (Netflix) to link ratings in the private database to public identities (in IMDB). They claim to have linked two of a few dozen IMDB users to anonymized netflix users. The claims seem a bit inflated to me, because (a) knowing the IMDB identity isn’t equivalent to knowing the person and (b) the claims of statistical significance are with respect to a model of the world they created (rather than one they created). Overall, this is another example showing that complete privacy is hard . It may be worth remembering that there are some substantial benefits from the Netf
4 0.56287849 94 hunch net-2005-07-13-Text Entailment at AAAI
Introduction: Rajat Raina presented a paper on the technique they used for the PASCAL Recognizing Textual Entailment challenge. “Text entailment” is the problem of deciding if one sentence implies another. For example the previous sentence entails: Text entailment is a decision problem. One sentence can imply another. The challenge was of the form: given an original sentence and another sentence predict whether there was an entailment. All current techniques for predicting correctness of an entailment are at the “flail” stage—accuracies of around 58% where humans could achieve near 100% accuracy, so there is much room to improve. Apparently, there may be another PASCAL challenge on this problem in the near future.
5 0.50973123 362 hunch net-2009-06-26-Netflix nearly done
Introduction: A $1M qualifying result was achieved on the public Netflix test set by a 3-way ensemble team . This is just in time for Yehuda ‘s presentation at KDD , which I’m sure will be one of the best attended ever. This isn’t quite over—there are a few days for another super-conglomerate team to come together and there is some small chance that the performance is nonrepresentative of the final test set, but I expect not. Regardless of the final outcome, the biggest lesson for ML from the Netflix contest has been the formidable performance edge of ensemble methods.
6 0.35738111 418 hunch net-2010-12-02-Traffic Prediction Problem
7 0.34809861 171 hunch net-2006-04-09-Progress in Machine Translation
8 0.34626356 284 hunch net-2008-01-18-Datasets
9 0.31575811 26 hunch net-2005-02-21-Problem: Cross Validation
10 0.31456801 63 hunch net-2005-04-27-DARPA project: LAGR
11 0.30745065 129 hunch net-2005-11-07-Prediction Competitions
12 0.28602886 122 hunch net-2005-10-13-Site tweak
13 0.28425735 211 hunch net-2006-10-02-$1M Netflix prediction contest
14 0.27582628 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
15 0.27024114 70 hunch net-2005-05-12-Math on the Web
16 0.26244998 190 hunch net-2006-07-06-Branch Prediction Competition
17 0.25853005 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning
18 0.25494263 47 hunch net-2005-03-28-Open Problems for Colt
19 0.24793248 371 hunch net-2009-09-21-Netflix finishes (and starts)
20 0.24145216 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
topicId topicWeight
[(27, 0.064), (38, 0.194), (90, 0.539)]
simIndex simValue blogId blogTitle
1 0.94242382 284 hunch net-2008-01-18-Datasets
Introduction: David Pennock notes the impressive set of datasets at datawrangling .
2 0.89819664 323 hunch net-2008-11-04-Rise of the Machines
Introduction: On the enduring topic of how people deal with intelligent machines , we have this important election bulletin .
same-blog 3 0.8812393 239 hunch net-2007-04-18-$50K Spock Challenge
Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
4 0.87242389 340 hunch net-2009-01-28-Nielsen’s talk
Introduction: I wanted to point to Michael Nielsen’s talk about blogging science, which I found interesting.
5 0.72182333 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
6 0.61877453 139 hunch net-2005-12-11-More NIPS Papers
7 0.56400418 144 hunch net-2005-12-28-Yet more nips thoughts
8 0.38503265 333 hunch net-2008-12-27-Adversarial Academia
9 0.34715599 125 hunch net-2005-10-20-Machine Learning in the News
10 0.34157884 339 hunch net-2009-01-27-Key Scientific Challenges
11 0.33486497 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
12 0.33295518 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
13 0.32762837 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS
14 0.32011122 170 hunch net-2006-04-06-Bounds greater than 1
15 0.28617704 353 hunch net-2009-05-08-Computability in Artificial Intelligence
16 0.28160071 233 hunch net-2007-02-16-The Forgetting
17 0.27493459 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
18 0.27342027 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
19 0.2432256 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
20 0.21409437 26 hunch net-2005-02-21-Problem: Cross Validation