hunch_net hunch_net-2010 hunch_net-2010-393 knowledge-graph by maker-knowledge-mining

393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms


meta infos for this blog

Source: html

Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.  Great.  But what about on other datasets?  How sensitive is this result?   What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset.  Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.  Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Much of the success and popularity of machine learning has been driven by its practical impact. [sent-1, score-0.154]

2 Of course, the evaluation of empirical work is an integral part of the field. [sent-2, score-0.073]

3 But are the existing mechanisms for evaluating algorithms and comparing results good enough? [sent-3, score-0.276]

4 We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets. [sent-4, score-0.341]

5 What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? [sent-8, score-0.309]

6 Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset. [sent-9, score-0.283]

7 Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing. [sent-10, score-0.283]

8 Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options. [sent-11, score-0.597]

9 And what if we wanted to compare on more than just one dataset and two algorithms? [sent-12, score-0.314]

10 Incomplete View of State-of-the-Art: Basic question: What’s the best algorithm for your favorite dataset? [sent-13, score-0.143]

11 To find out, you could simply plow through fifty papers, get code from any author willing to reply, and reimplement the rest. [sent-14, score-0.217]

12 In short, it’s a collaborative website for objectively comparing machine learning programs across various datasets. [sent-20, score-0.882]

13 On the website, a user can do any combination of the following: Upload a program to our online repository. [sent-21, score-0.398]

14 ) For any executed run, view the results (various error metrics and time/memory usage statistics). [sent-25, score-0.335]

15 Download any dataset, program, or run for further use. [sent-26, score-0.115]

16 An important aspect of the site is that it’s collaborative : by uploading just one program or dataset, a user taps into the entire network of existing programs and datasets for comparison. [sent-27, score-1.244]

17 org), MLcomp is unique in that data and code interact to produce analyzable results. [sent-31, score-0.201]

18 Currently, seven machine learn task types (classification, regression, collaborative filtering, sequence tagging, etc. [sent-33, score-0.219]

19 ) are supported, with hundreds of standard programs and datasets already online. [sent-34, score-0.428]

20 We encourage you to browse the site and hopefully contribute more! [sent-35, score-0.176]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mlcomp', 0.364), ('user', 0.258), ('dataset', 0.231), ('collaborative', 0.219), ('programs', 0.212), ('beats', 0.205), ('upload', 0.169), ('website', 0.161), ('algorithm', 0.143), ('incomplete', 0.141), ('datasets', 0.14), ('program', 0.14), ('code', 0.133), ('comparing', 0.132), ('run', 0.115), ('site', 0.108), ('currently', 0.101), ('view', 0.097), ('shortcomings', 0.091), ('objectively', 0.091), ('disclosure', 0.091), ('percy', 0.091), ('amazon', 0.091), ('uploading', 0.091), ('usage', 0.084), ('tedious', 0.084), ('supported', 0.084), ('reimplement', 0.084), ('proposes', 0.084), ('executed', 0.084), ('popularity', 0.084), ('two', 0.083), ('reply', 0.08), ('version', 0.078), ('tagging', 0.076), ('hundreds', 0.076), ('download', 0.076), ('existing', 0.076), ('integral', 0.073), ('uci', 0.073), ('outperform', 0.073), ('metrics', 0.07), ('driven', 0.07), ('formats', 0.07), ('jake', 0.07), ('contribute', 0.068), ('interact', 0.068), ('evaluating', 0.068), ('various', 0.067), ('svms', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.  Great.  But what about on other datasets?  How sensitive is this result?   What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset.  Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.  Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options

2 0.15650511 399 hunch net-2010-05-20-Google Predict

Introduction: Slashdot points out Google Predict . I’m not privy to the details, but this has the potential to be extremely useful, as in many applications simply having an easy mechanism to apply existing learning algorithms can be extremely helpful. This differs goalwise from MLcomp —instead of public comparisons for research purposes, it’s about private utilization of good existing algorithms. It also differs infrastructurally, since a system designed to do this is much less awkward than using Amazon’s cloud computing. The latter implies that datasets several order of magnitude larger can be handled up to limits imposed by network and storage.

3 0.14593372 454 hunch net-2012-01-30-ICML Posters and Scope

Introduction: Normally, I don’t indulge in posters for ICML , but this year is naturally an exception for me. If you want one, there are a small number left here , if you sign up before February. It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage). At the highest level, I believe ICML exists to further research into machine learning, which I gene

4 0.13394485 19 hunch net-2005-02-14-Clever Methods of Overfitting

Introduction: “Overfitting” is traditionally defined as training some flexible representation so that it memorizes the data but fails to predict well in the future. For this post, I will define overfitting more generally as over-representing the performance of systems. There are two styles of general overfitting: overrepresenting performance on particular datasets and (implicitly) overrepresenting performance of a method on future datasets. We should all be aware of these methods, avoid them where possible, and take them into account otherwise. I have used “reproblem” and “old datasets”, and may have participated in “overfitting by review”—some of these are very difficult to avoid. Name Method Explanation Remedy Traditional overfitting Train a complex predictor on too-few examples. Hold out pristine examples for testing. Use a simpler predictor. Get more training examples. Integrate over many predictors. Reject papers which do this. Parameter twe

5 0.1302468 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran

6 0.12219543 423 hunch net-2011-02-02-User preferences for search engines

7 0.12181456 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

8 0.11166507 364 hunch net-2009-07-11-Interesting papers at KDD

9 0.11080915 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

10 0.1106477 297 hunch net-2008-04-22-Taking the next step

11 0.10335461 225 hunch net-2007-01-02-Retrospective

12 0.10183156 446 hunch net-2011-10-03-Monday announcements

13 0.10043955 20 hunch net-2005-02-15-ESPgame and image labeling

14 0.08775489 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

15 0.085472807 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

16 0.083954744 304 hunch net-2008-06-27-Reviewing Horror Stories

17 0.081818551 332 hunch net-2008-12-23-Use of Learning Theory

18 0.081586145 148 hunch net-2006-01-13-Benchmarks for RL

19 0.081125602 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

20 0.080031909 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.208), (1, 0.007), (2, -0.024), (3, -0.001), (4, 0.046), (5, 0.037), (6, -0.067), (7, -0.048), (8, -0.056), (9, 0.035), (10, -0.113), (11, 0.051), (12, -0.033), (13, -0.008), (14, -0.035), (15, -0.072), (16, 0.003), (17, -0.057), (18, -0.009), (19, 0.026), (20, 0.071), (21, 0.001), (22, 0.024), (23, -0.079), (24, -0.017), (25, 0.019), (26, -0.018), (27, -0.06), (28, -0.047), (29, -0.097), (30, -0.064), (31, 0.072), (32, 0.07), (33, -0.105), (34, 0.061), (35, 0.025), (36, 0.008), (37, -0.055), (38, 0.143), (39, 0.027), (40, 0.012), (41, 0.121), (42, -0.022), (43, 0.024), (44, 0.034), (45, -0.077), (46, 0.142), (47, -0.043), (48, -0.117), (49, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96217215 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.  Great.  But what about on other datasets?  How sensitive is this result?   What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset.  Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.  Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options

2 0.64236671 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran

3 0.58937013 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve

4 0.57336205 399 hunch net-2010-05-20-Google Predict

Introduction: Slashdot points out Google Predict . I’m not privy to the details, but this has the potential to be extremely useful, as in many applications simply having an easy mechanism to apply existing learning algorithms can be extremely helpful. This differs goalwise from MLcomp —instead of public comparisons for research purposes, it’s about private utilization of good existing algorithms. It also differs infrastructurally, since a system designed to do this is much less awkward than using Amazon’s cloud computing. The latter implies that datasets several order of magnitude larger can be handled up to limits imposed by network and storage.

5 0.56910384 19 hunch net-2005-02-14-Clever Methods of Overfitting

Introduction: “Overfitting” is traditionally defined as training some flexible representation so that it memorizes the data but fails to predict well in the future. For this post, I will define overfitting more generally as over-representing the performance of systems. There are two styles of general overfitting: overrepresenting performance on particular datasets and (implicitly) overrepresenting performance of a method on future datasets. We should all be aware of these methods, avoid them where possible, and take them into account otherwise. I have used “reproblem” and “old datasets”, and may have participated in “overfitting by review”—some of these are very difficult to avoid. Name Method Explanation Remedy Traditional overfitting Train a complex predictor on too-few examples. Hold out pristine examples for testing. Use a simpler predictor. Get more training examples. Integrate over many predictors. Reject papers which do this. Parameter twe

6 0.53380495 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

7 0.52664763 297 hunch net-2008-04-22-Taking the next step

8 0.50770241 454 hunch net-2012-01-30-ICML Posters and Scope

9 0.50015622 177 hunch net-2006-05-05-An ICML reject

10 0.49991655 325 hunch net-2008-11-10-ICML Reviewing Criteria

11 0.49480721 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

12 0.48143387 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

13 0.47317198 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

14 0.46593574 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

15 0.4647136 476 hunch net-2012-12-29-Simons Institute Big Data Program

16 0.45839357 87 hunch net-2005-06-29-Not EM for clustering at COLT

17 0.45374891 446 hunch net-2011-10-03-Monday announcements

18 0.44753033 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

19 0.43516904 441 hunch net-2011-08-15-Vowpal Wabbit 6.0

20 0.42833927 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.031), (27, 0.166), (38, 0.062), (51, 0.343), (53, 0.062), (55, 0.126), (71, 0.012), (94, 0.091), (95, 0.01)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95913088 489 hunch net-2013-09-20-No NY ML Symposium in 2013, and some good news

Introduction: There will be no New York ML Symposium this year. The core issue is that NYAS is disorganized by people leaving, pushing back the date, with the current candidate a spring symposium on March 28. Gunnar and I were outvoted here—we were gung ho on organizing a fall symposium, but the rest of the committee wants to wait. In some good news, most of the ICML 2012 videos have been restored from a deep backup.

2 0.91426605 324 hunch net-2008-11-09-A Healthy COLT

Introduction: A while ago , we discussed the health of COLT . COLT 2008 substantially addressed my concerns. The papers were diverse and several were interesting. Attendance was up, which is particularly notable in Europe. In my opinion, the colocation with UAI and ICML was the best colocation since 1998. And, perhaps best of all, registration ended up being free for all students due to various grants from the Academy of Finland , Google , IBM , and Yahoo . A basic question is: what went right? There seem to be several answers. Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel. Organization-wise, the Finns were great with hordes of volunteers helping set everything up. Having too many volunteers is a good failure mode. Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program. Facilities-wise, proximity in time and space made

same-blog 3 0.86804211 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.  Great.  But what about on other datasets?  How sensitive is this result?   What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset.  Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.  Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options

4 0.8370232 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

Introduction: Several talks seem potentially interesting to ML folks at this year’s SODA. Maria-Florina Balcan , Avrim Blum , and Anupam Gupta , Approximate Clustering without the Approximation . This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. Essentially, they show that if any near-approximation to an optimal solution is good, then it’s computationally easy to find a near-optimal solution. This subtle shift bears serious thought. A similar one occurred in our ranking paper with respect to minimum feedback arcset. With two known examples, it suggests that many more NP-complete problems might be finessed into irrelevance in this style. Yury Lifshits and Shengyu Zhang , Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Smal

5 0.79125184 179 hunch net-2006-05-16-The value of the orthodox view of Boosting

Introduction: The term “boosting” comes from the idea of using a meta-algorithm which takes “weak” learners (that may be able to only barely predict slightly better than random) and turn them into strongly capable learners (which predict very well). Adaboost in 1995 was the first widely used (and useful) boosting algorithm, although there were theoretical boosting algorithms floating around since 1990 (see the bottom of this page ). Since then, many different interpretations of why boosting works have arisen. There is significant discussion about these different views in the annals of statistics , including a response by Yoav Freund and Robert Schapire . I believe there is a great deal of value to be found in the original view of boosting (meta-algorithm for creating a strong learner from a weak learner). This is not a claim that one particular viewpoint obviates the value of all others, but rather that no other viewpoint seems to really capture important properties. Comparing wit

6 0.77174085 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

7 0.63818026 235 hunch net-2007-03-03-All Models of Learning have Flaws

8 0.6146149 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

9 0.59056401 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

10 0.58788562 403 hunch net-2010-07-18-ICML & COLT 2010

11 0.58639699 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

12 0.58343309 416 hunch net-2010-10-29-To Vidoelecture or not

13 0.57600665 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

14 0.56973982 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

15 0.56907839 439 hunch net-2011-08-01-Interesting papers at COLT 2011

16 0.56752181 463 hunch net-2012-05-02-ICML: Behind the Scenes

17 0.56714481 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

18 0.56588721 454 hunch net-2012-01-30-ICML Posters and Scope

19 0.56517762 309 hunch net-2008-07-10-Interesting papers, ICML 2008

20 0.56413978 132 hunch net-2005-11-26-The Design of an Optimal Research Environment