hunch_net hunch_net-2007 hunch_net-2007-281 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
sentIndex sentText sentNum sentScore
1 We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. [sent-1, score-0.523]
2 Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. [sent-3, score-0.398]
3 To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. [sent-4, score-0.07]
4 A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. [sent-5, score-0.35]
5 A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. [sent-6, score-0.65]
6 A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. [sent-7, score-0.33]
7 VW tackles large scale learning problems by this definition of large. [sent-8, score-0.328]
8 I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). [sent-9, score-0.426]
9 A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . [sent-10, score-0.278]
10 The approach used by this toolbox is essentially map-reduce style computation, which doesn’t seem amenable to online learning approaches. [sent-11, score-0.538]
11 This is significant, because the fastest learning algorithms without parallelization tend to be online learning algorithms. [sent-12, score-0.481]
12 Leon Bottou ‘s sgd implementation first loads data into RAM, then learns. [sent-13, score-0.254]
13 Leon’s code is a great demonstrator of how fast and effective online learning approaches (specifically stochastic gradient descent) can be. [sent-14, score-0.626]
14 VW is about a factor of 3 faster on my desktop, and yields a lower error rate solution. [sent-15, score-0.069]
15 There are several other features such as feature pairing, sparse features, and namespacing that are often handy in practice. [sent-16, score-0.242]
16 At present, VW optimizes squared loss via gradient descent or exponentiated gradient descent over a linear representation. [sent-17, score-0.859]
17 This code is free to use, incorporate, and modify as per the BSD (revised) license. [sent-18, score-0.266]
18 We will gladly incorporate significant improvements from other people, and I believe any significant improvements are of substantial research interest. [sent-20, score-0.623]
wordName wordTfidf (topN-words)
[('ram', 0.251), ('vw', 0.219), ('desktop', 0.216), ('toolbox', 0.216), ('bsd', 0.216), ('revised', 0.216), ('descent', 0.178), ('medium', 0.167), ('incorporate', 0.167), ('supervised', 0.163), ('code', 0.158), ('gradient', 0.154), ('leon', 0.149), ('scale', 0.148), ('source', 0.141), ('open', 0.137), ('online', 0.137), ('project', 0.122), ('improvements', 0.121), ('dataset', 0.11), ('modify', 0.108), ('amenable', 0.108), ('significant', 0.107), ('large', 0.103), ('fastest', 0.1), ('ongoing', 0.1), ('exponentiated', 0.1), ('loads', 0.1), ('fast', 0.1), ('optimizes', 0.095), ('features', 0.092), ('normal', 0.09), ('parallelization', 0.09), ('releasing', 0.087), ('lihong', 0.087), ('inside', 0.084), ('fits', 0.084), ('sgd', 0.084), ('bottou', 0.081), ('specifically', 0.079), ('li', 0.079), ('learning', 0.077), ('handy', 0.077), ('alex', 0.077), ('small', 0.074), ('sparse', 0.073), ('ibm', 0.071), ('implementation', 0.07), ('useful', 0.07), ('yields', 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
2 0.25744739 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
3 0.22321345 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve
4 0.19082719 111 hunch net-2005-09-12-Fast Gradient Descent
Introduction: Nic Schaudolph has been developing a fast gradient descent algorithm called Stochastic Meta-Descent (SMD). Gradient descent is currently untrendy in the machine learning community, but there remains a large number of people using gradient descent on neural networks or other architectures from when it was trendy in the early 1990s. There are three problems with gradient descent. Gradient descent does not necessarily produce easily reproduced results. Typical algorithms start with “set the initial parameters to small random values”. The design of the representation that gradient descent is applied to is often nontrivial. In particular, knowing exactly how to build a large neural network so that it will perform well requires knowledge which has not been made easily applicable. Gradient descent can be slow. Obviously, taking infinitesimal steps in the direction of the gradient would take forever, so some finite step size must be used. What exactly this step size should be
5 0.17651503 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
6 0.16661792 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
7 0.16284244 346 hunch net-2009-03-18-Parallel ML primitives
8 0.15692109 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
9 0.15472381 258 hunch net-2007-08-12-Exponentiated Gradient
10 0.15254392 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
11 0.1514947 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
12 0.14488411 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
13 0.1391871 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS
14 0.13637708 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
15 0.1330414 436 hunch net-2011-06-22-Ultra LDA
16 0.12516487 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
17 0.11047209 332 hunch net-2008-12-23-Use of Learning Theory
18 0.10837215 347 hunch net-2009-03-26-Machine Learning is too easy
19 0.10742257 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
20 0.10502458 267 hunch net-2007-10-17-Online as the new adjective
topicId topicWeight
[(0, 0.226), (1, 0.102), (2, -0.124), (3, -0.014), (4, 0.064), (5, 0.063), (6, -0.221), (7, -0.069), (8, -0.19), (9, 0.228), (10, -0.1), (11, -0.065), (12, 0.039), (13, -0.04), (14, -0.007), (15, -0.017), (16, -0.001), (17, -0.02), (18, -0.048), (19, -0.067), (20, 0.038), (21, 0.142), (22, 0.022), (23, -0.014), (24, 0.016), (25, 0.05), (26, -0.077), (27, 0.057), (28, -0.06), (29, -0.025), (30, 0.067), (31, 0.058), (32, -0.084), (33, 0.01), (34, -0.04), (35, -0.041), (36, -0.067), (37, -0.072), (38, 0.04), (39, -0.038), (40, 0.003), (41, -0.052), (42, 0.015), (43, 0.026), (44, 0.048), (45, 0.077), (46, 0.011), (47, 0.036), (48, 0.006), (49, -0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.93757755 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
2 0.84279907 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
3 0.798603 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve
4 0.76573843 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa
5 0.71474838 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
6 0.70927584 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
7 0.68746436 436 hunch net-2011-06-22-Ultra LDA
8 0.66859269 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
9 0.65167373 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
10 0.63969457 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
11 0.61867315 111 hunch net-2005-09-12-Fast Gradient Descent
12 0.61583549 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
13 0.57576799 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
14 0.55568326 258 hunch net-2007-08-12-Exponentiated Gradient
15 0.53480017 229 hunch net-2007-01-26-Parallel Machine Learning Problems
16 0.53010684 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
17 0.52126491 490 hunch net-2013-11-09-Graduates and Postdocs
18 0.50802219 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
19 0.49508682 471 hunch net-2012-08-24-Patterns for research in machine learning
20 0.4941425 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
topicId topicWeight
[(27, 0.183), (38, 0.013), (51, 0.077), (53, 0.037), (55, 0.129), (84, 0.018), (86, 0.231), (94, 0.154), (95, 0.055)]
simIndex simValue blogId blogTitle
1 0.95995098 477 hunch net-2013-01-01-Deep Learning 2012
Introduction: 2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include: Winning a Kaggle competition . Wide adoption of deep learning for speech recognition . Significant industry support . Gains in image recognition . This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.
same-blog 2 0.89773422 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti
3 0.82665396 273 hunch net-2007-11-16-MLSS 2008
Introduction: … is in Kioloa, Australia from March 3 to March 14. It’s a great chance to learn something about Machine Learning and I’ve enjoyed several previous Machine Learning Summer Schools . The website has many more details , but registration is open now for the first 80 to sign up.
4 0.79645896 54 hunch net-2005-04-08-Fast SVMs
Introduction: There was a presentation at snowbird about parallelized support vector machines. In many cases, people parallelize by ignoring serial operations, but that is not what happened here—they parallelize with optimizations. Consequently, this seems to be the fastest SVM in existence. There is a related paper here .
5 0.74426734 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
6 0.74067378 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
7 0.73825198 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
8 0.72402585 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
9 0.72391438 229 hunch net-2007-01-26-Parallel Machine Learning Problems
10 0.72291529 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
11 0.72062856 146 hunch net-2006-01-06-MLTV
12 0.71730888 423 hunch net-2011-02-02-User preferences for search engines
13 0.71521002 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
14 0.71053171 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
15 0.70814705 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)
16 0.70486093 96 hunch net-2005-07-21-Six Months
17 0.70428509 179 hunch net-2006-05-16-The value of the orthodox view of Boosting
18 0.7036671 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
19 0.70361334 75 hunch net-2005-05-28-Running A Machine Learning Summer School
20 0.70218611 40 hunch net-2005-03-13-Avoiding Bad Reviewing