hunch_net hunch_net-2011 hunch_net-2011-441 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
sentIndex sentText sentNum sentScore
1 Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . [sent-3, score-0.272]
2 Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. [sent-4, score-0.303]
3 The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). [sent-5, score-0.696]
4 This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). [sent-6, score-1.043]
5 The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. [sent-7, score-0.5]
6 The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. [sent-9, score-1.008]
7 The new matrix factorization code from Jake adds a core algorithm. [sent-10, score-0.67]
8 We finally have basic persistent daemon support, again with Jake’s help. [sent-11, score-0.216]
9 Adaptive gradient calculations can now be made dimensionally correct, following up on Paul’s post , yielding a better algorithm. [sent-12, score-0.348]
10 And Nikos sped it up further with SSE native inverse square root. [sent-13, score-0.397]
11 The LDA core is perhaps twice as fast after Paul educated us about SSE and representational gymnastics . [sent-14, score-0.586]
12 All of the above was done without adding significant new dependencies, so the code should compile easily. [sent-15, score-0.364]
13 The VW mailing list has been slowly growing, and is a good place to ask questions. [sent-16, score-0.167]
wordName wordTfidf (topN-words)
[('clusters', 0.279), ('code', 0.271), ('sse', 0.252), ('core', 0.212), ('lbfgs', 0.207), ('conjugate', 0.195), ('jake', 0.173), ('paul', 0.162), ('gradient', 0.159), ('vw', 0.151), ('calculations', 0.112), ('sped', 0.112), ('terafeature', 0.112), ('dominate', 0.112), ('allreduce', 0.112), ('daemon', 0.112), ('kilonode', 0.112), ('size', 0.107), ('persistent', 0.104), ('enjoy', 0.104), ('loads', 0.104), ('gymnastics', 0.104), ('miro', 0.104), ('inverse', 0.098), ('native', 0.098), ('dependencies', 0.098), ('factorization', 0.098), ('primarily', 0.098), ('orders', 0.093), ('compile', 0.093), ('representational', 0.093), ('parallelization', 0.093), ('educated', 0.093), ('alekh', 0.093), ('scalable', 0.089), ('thoroughly', 0.089), ('parallelize', 0.089), ('square', 0.089), ('adds', 0.089), ('reused', 0.089), ('lda', 0.089), ('nikos', 0.089), ('easily', 0.087), ('mailing', 0.086), ('twice', 0.084), ('hadoop', 0.084), ('thanks', 0.081), ('slowly', 0.081), ('baseline', 0.081), ('yielding', 0.077)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
2 0.20417486 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve
3 0.18088026 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa
4 0.18062027 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
5 0.17651503 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
6 0.16459532 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
7 0.15588062 436 hunch net-2011-06-22-Ultra LDA
8 0.13233405 346 hunch net-2009-03-18-Parallel ML primitives
9 0.1117909 111 hunch net-2005-09-12-Fast Gradient Descent
10 0.10577659 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
11 0.1007164 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
12 0.091214001 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
13 0.088539481 490 hunch net-2013-11-09-Graduates and Postdocs
14 0.087645724 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
15 0.083045222 229 hunch net-2007-01-26-Parallel Machine Learning Problems
16 0.079106569 371 hunch net-2009-09-21-Netflix finishes (and starts)
17 0.075175516 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
18 0.068716399 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
19 0.068711363 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
20 0.066003658 412 hunch net-2010-09-28-Machined Learnings
topicId topicWeight
[(0, 0.141), (1, 0.026), (2, -0.097), (3, 0.008), (4, 0.068), (5, 0.042), (6, -0.18), (7, -0.094), (8, -0.163), (9, 0.15), (10, -0.13), (11, -0.034), (12, 0.064), (13, -0.018), (14, 0.01), (15, -0.088), (16, -0.021), (17, 0.012), (18, 0.001), (19, -0.085), (20, -0.001), (21, 0.095), (22, -0.021), (23, 0.005), (24, -0.095), (25, -0.037), (26, -0.115), (27, 0.015), (28, -0.037), (29, 0.009), (30, 0.028), (31, 0.093), (32, 0.016), (33, 0.027), (34, -0.015), (35, 0.008), (36, -0.076), (37, 0.001), (38, 0.0), (39, -0.015), (40, -0.036), (41, 0.023), (42, -0.023), (43, 0.004), (44, 0.123), (45, 0.041), (46, 0.006), (47, 0.059), (48, -0.046), (49, -0.052)]
simIndex simValue blogId blogTitle
same-blog 1 0.98822343 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
2 0.81150997 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve
3 0.74976003 436 hunch net-2011-06-22-Ultra LDA
Introduction: Shravan and Alex ‘s LDA code is released . On a single machine, I’m not sure how it currently compares to the online LDA in VW , but the ability to effectively scale across very many machines is surely interesting.
4 0.72563726 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa
5 0.70571715 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
6 0.67632765 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
7 0.63291055 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
8 0.6278289 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
9 0.57467961 346 hunch net-2009-03-18-Parallel ML primitives
10 0.56743264 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
11 0.55522388 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
12 0.54226243 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
13 0.52721119 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
14 0.49077332 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
15 0.47732484 471 hunch net-2012-08-24-Patterns for research in machine learning
16 0.4645609 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
17 0.41131237 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
18 0.410734 490 hunch net-2013-11-09-Graduates and Postdocs
19 0.40576783 229 hunch net-2007-01-26-Parallel Machine Learning Problems
20 0.40149075 111 hunch net-2005-09-12-Fast Gradient Descent
topicId topicWeight
[(9, 0.026), (27, 0.11), (38, 0.023), (55, 0.069), (71, 0.033), (77, 0.048), (78, 0.404), (94, 0.195)]
simIndex simValue blogId blogTitle
same-blog 1 0.89861846 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
2 0.7374956 316 hunch net-2008-09-04-Fall ML Conferences
Introduction: If you are in the New York area and interested in machine learning, consider submitting a 2 page abstract to the ML symposium by tomorrow (Sept 5th) midnight. It’s a fun one day affair on October 10 in an awesome location overlooking the world trade center site. A bit further off (but a real conference) is the AI and Stats deadline on November 5, to be held in Florida April 16-19.
3 0.57943261 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask
4 0.56016606 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
5 0.48943198 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
Introduction: Chicago ’05 ended a couple of weeks ago. This was the sixth Machine Learning Summer School , and the second one that used a wiki . (The first was Berder ’04, thanks to Gunnar Raetsch.) Wikis are relatively easy to set up, greatly aid social interaction, and should be used a lot more at summer schools and workshops. They can even be used as the meeting’s webpage, as a permanent record of its participants’ collaborations — see for example the wiki/website for last year’s NVO Summer School . A basic wiki is a collection of editable webpages, maintained by software called a wiki engine . The engine used at both Berder and Chicago was TikiWiki — it is well documented and gets you something running fast. It uses PHP and MySQL, but doesn’t require you to know either. Tikiwiki has far more features than most wikis, as it is really a full Content Management System . (My thanks to Sebastian Stark for pointing this out.) Here are the features we found most useful: Bulletin boa
6 0.48725849 346 hunch net-2009-03-18-Parallel ML primitives
7 0.48571682 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
8 0.4844681 120 hunch net-2005-10-10-Predictive Search is Coming
9 0.48104906 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
10 0.47688591 42 hunch net-2005-03-17-Going all the Way, Sometimes
11 0.47662735 35 hunch net-2005-03-04-The Big O and Constants in Learning
12 0.47519696 276 hunch net-2007-12-10-Learning Track of International Planning Competition
13 0.4723025 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
14 0.44740599 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
15 0.44275364 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
16 0.44180506 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
17 0.44114327 229 hunch net-2007-01-26-Parallel Machine Learning Problems
18 0.42416 147 hunch net-2006-01-08-Debugging Your Brain
19 0.42248696 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
20 0.41019338 423 hunch net-2011-02-02-User preferences for search engines