hunch_net hunch_net-2008 hunch_net-2008-300 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
sentIndex sentText sentNum sentScore
1 The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. [sent-1, score-0.483]
2 From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. [sent-2, score-1.206]
3 medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. [sent-3, score-0.797]
4 large A large dataset does not fit in the RAM of a reasonably priced computer. [sent-4, score-0.886]
5 By this definition, all of the datasets are medium sized. [sent-5, score-0.512]
6 From medium to large, it becomes essential to have learning algorithms that don’t require random access to examples. [sent-9, score-0.449]
7 No Loading Time The medium scale nature of the datasets is tacitly acknowledged in the rules which exclude data loading time. [sent-10, score-1.37]
8 My experience is that parsing and loading large datasets is often the computational bottleneck. [sent-11, score-0.815]
9 (No ‘excluding loading time’ number can be found for VW, of course, because loading and learning are intertwined. [sent-14, score-0.722]
10 With an appropriate choice of this initial parameter (which you can freely optimize on the data), training time is zero. [sent-19, score-0.287]
11 Parallelism One approach to dealing with large amounts of data is to add computers that operate in parallel. [sent-20, score-0.422]
12 This is very natural (the brain is vastly parallel at the neuron level), and there are substantial research questions in parallel machine learning. [sent-21, score-0.45]
13 There are good reasons for this: parallel architectures aren’t very standard yet, and buying multiple computers is still substantially more expensive than buying the RAM to fit the dataset sizes. [sent-23, score-0.812]
14 As a consequence of this design, the contest prefers algorithms that load all data into memory then operate on it. [sent-26, score-0.409]
15 These design decisions discourage large scale algorithms (where large is as defined above) in favor of medium scale learning algorithms. [sent-28, score-1.306]
16 The design also favors highly parameterized learning algorithms over less parameterized algorithms, which is the opposite of my personal preference for research direction. [sent-29, score-0.514]
17 It’s probably too late to get large datasets, but using wall-clock time would at least avoid bias against large scale algorithms. [sent-32, score-0.692]
18 Even without any rule changes, it’s outcome tells us something about which sorts of algorithms work at a medium scale. [sent-35, score-0.529]
19 The datasets are also large enough to break every Theta(m 2 ) algorithm. [sent-37, score-0.347]
20 update : Soeren has helped setup an SMP parallel track which address some of the concerns above. [sent-39, score-0.318]
wordName wordTfidf (topN-words)
[('loading', 0.361), ('medium', 0.347), ('dataset', 0.228), ('parallel', 0.193), ('large', 0.182), ('smp', 0.178), ('scale', 0.176), ('datasets', 0.165), ('ram', 0.149), ('priced', 0.144), ('exclude', 0.128), ('rules', 0.127), ('concerns', 0.125), ('parameter', 0.122), ('buying', 0.119), ('parsing', 0.107), ('parameterized', 0.103), ('algorithms', 0.102), ('sgd', 0.099), ('definition', 0.098), ('operate', 0.093), ('size', 0.092), ('time', 0.09), ('contest', 0.089), ('computers', 0.081), ('sorts', 0.08), ('reasonably', 0.078), ('design', 0.077), ('initial', 0.075), ('fit', 0.072), ('personal', 0.07), ('final', 0.07), ('issues', 0.068), ('small', 0.066), ('data', 0.066), ('declared', 0.064), ('neuron', 0.064), ('parsed', 0.064), ('detailing', 0.064), ('disappointing', 0.064), ('discourage', 0.064), ('soeren', 0.064), ('least', 0.062), ('human', 0.062), ('optimal', 0.062), ('appear', 0.06), ('opposite', 0.059), ('supported', 0.059), ('transitions', 0.059), ('prefers', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
2 0.25744739 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
3 0.17534176 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of
4 0.16727184 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
5 0.14240184 454 hunch net-2012-01-30-ICML Posters and Scope
Introduction: Normally, I don’t indulge in posters for ICML , but this year is naturally an exception for me. If you want one, there are a small number left here , if you sign up before February. It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage). At the highest level, I believe ICML exists to further research into machine learning, which I gene
6 0.14188342 19 hunch net-2005-02-14-Clever Methods of Overfitting
7 0.1302468 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
8 0.12616239 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
9 0.11376581 404 hunch net-2010-08-20-The Workshop on Cores, Clusters, and Clouds
10 0.11258009 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
11 0.11185643 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
12 0.10589326 229 hunch net-2007-01-26-Parallel Machine Learning Problems
13 0.10539781 20 hunch net-2005-02-15-ESPgame and image labeling
14 0.1045235 143 hunch net-2005-12-27-Automated Labeling
15 0.10349647 120 hunch net-2005-10-10-Predictive Search is Coming
16 0.10248989 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
17 0.10209089 129 hunch net-2005-11-07-Prediction Competitions
18 0.10190318 371 hunch net-2009-09-21-Netflix finishes (and starts)
19 0.099434443 211 hunch net-2006-10-02-$1M Netflix prediction contest
20 0.09420155 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
topicId topicWeight
[(0, 0.241), (1, 0.05), (2, -0.089), (3, 0.022), (4, 0.055), (5, 0.057), (6, -0.145), (7, 0.004), (8, -0.084), (9, 0.087), (10, -0.17), (11, 0.064), (12, 0.038), (13, 0.008), (14, -0.027), (15, -0.084), (16, 0.041), (17, -0.017), (18, -0.042), (19, -0.01), (20, -0.006), (21, 0.027), (22, -0.029), (23, 0.054), (24, -0.01), (25, 0.011), (26, 0.013), (27, -0.082), (28, -0.0), (29, -0.074), (30, 0.068), (31, 0.032), (32, 0.021), (33, 0.013), (34, 0.059), (35, -0.057), (36, -0.063), (37, -0.035), (38, 0.056), (39, -0.027), (40, 0.054), (41, -0.001), (42, -0.006), (43, 0.087), (44, 0.048), (45, -0.008), (46, 0.029), (47, -0.106), (48, -0.071), (49, 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.95775872 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
2 0.70390898 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
3 0.67251545 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets. Great. But what about on other datasets? How sensitive is this result? What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset. Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing. Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options
4 0.66471082 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve
5 0.65864009 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of
6 0.65357393 346 hunch net-2009-03-18-Parallel ML primitives
7 0.64623541 128 hunch net-2005-11-05-The design of a computing cluster
8 0.62852061 229 hunch net-2007-01-26-Parallel Machine Learning Problems
9 0.62780285 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
10 0.62404746 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
11 0.60409385 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
12 0.60322368 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
13 0.57905972 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
14 0.57643837 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
15 0.56180179 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
16 0.55815482 19 hunch net-2005-02-14-Clever Methods of Overfitting
17 0.54613113 366 hunch net-2009-08-03-Carbon in Computer Science Research
18 0.51556659 399 hunch net-2010-05-20-Google Predict
19 0.5119195 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
20 0.50662726 471 hunch net-2012-08-24-Patterns for research in machine learning
topicId topicWeight
[(10, 0.018), (22, 0.011), (27, 0.188), (38, 0.073), (51, 0.288), (53, 0.028), (55, 0.088), (62, 0.012), (68, 0.02), (94, 0.128), (95, 0.061)]
simIndex simValue blogId blogTitle
1 0.90859127 324 hunch net-2008-11-09-A Healthy COLT
Introduction: A while ago , we discussed the health of COLT . COLT 2008 substantially addressed my concerns. The papers were diverse and several were interesting. Attendance was up, which is particularly notable in Europe. In my opinion, the colocation with UAI and ICML was the best colocation since 1998. And, perhaps best of all, registration ended up being free for all students due to various grants from the Academy of Finland , Google , IBM , and Yahoo . A basic question is: what went right? There seem to be several answers. Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel. Organization-wise, the Finns were great with hordes of volunteers helping set everything up. Having too many volunteers is a good failure mode. Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program. Facilities-wise, proximity in time and space made
2 0.90270883 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets. Great. But what about on other datasets? How sensitive is this result? What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset. Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing. Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options
3 0.89262837 489 hunch net-2013-09-20-No NY ML Symposium in 2013, and some good news
Introduction: There will be no New York ML Symposium this year. The core issue is that NYAS is disorganized by people leaving, pushing back the date, with the current candidate a spring symposium on March 28. Gunnar and I were outvoted here—we were gung ho on organizing a fall symposium, but the rest of the committee wants to wait. In some good news, most of the ICML 2012 videos have been restored from a deep backup.
4 0.87498051 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
Introduction: Several talks seem potentially interesting to ML folks at this year’s SODA. Maria-Florina Balcan , Avrim Blum , and Anupam Gupta , Approximate Clustering without the Approximation . This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. Essentially, they show that if any near-approximation to an optimal solution is good, then it’s computationally easy to find a near-optimal solution. This subtle shift bears serious thought. A similar one occurred in our ranking paper with respect to minimum feedback arcset. With two known examples, it suggests that many more NP-complete problems might be finessed into irrelevance in this style. Yury Lifshits and Shengyu Zhang , Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Smal
same-blog 5 0.86703062 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
6 0.8434146 179 hunch net-2006-05-16-The value of the orthodox view of Boosting
7 0.73437303 235 hunch net-2007-03-03-All Models of Learning have Flaws
8 0.70713675 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
9 0.69112468 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
10 0.69017959 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
11 0.67573094 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
12 0.66906518 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
13 0.66566116 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
14 0.6630711 345 hunch net-2009-03-08-Prediction Science
15 0.66042566 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
16 0.65123874 156 hunch net-2006-02-11-Yahoo’s Learning Problems.
17 0.65077001 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
18 0.65025491 449 hunch net-2011-11-26-Giving Thanks
19 0.64909506 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
20 0.64704537 406 hunch net-2010-08-22-KDD 2010