hunch_net hunch_net-2010 hunch_net-2010-419 knowledge-graph by maker-knowledge-mining

419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy


meta infos for this blog

Source: html

Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. [sent-3, score-0.593]

2 The biggest changes are new algorithms: Nikos and I improved the default algorithm. [sent-4, score-0.366]

3 The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. [sent-5, score-0.499]

4 Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. [sent-8, score-0.333]

5 Often, this works better than the default algorithm. [sent-10, score-0.243]

6 It isn’t the default because it isn’t (yet) as adaptable in terms of learning rate decay. [sent-11, score-0.243]

7 This is enabled with –adaptive and learned regressors are compatible with the default. [sent-12, score-0.441]

8 Nikos noticed that the phenomenal quake inverse square root hack applies making this substantially faster than a naive implementation. [sent-14, score-0.284]

9 Nikos and Daniel also implemented active learning derived from this paper , usable via –active_simulation (to test parameters on an existing supervised dataset) or –active_learning (to do the real thing). [sent-15, score-0.3]

10 We see this approach dominating supervised learning on all classification datasets so far, often with far fewer labeled examples required, as the theory predicts. [sent-17, score-0.233]

11 The learned predictor is compatible with the default. [sent-18, score-0.36]

12 Olivier helped me implement preconditioned conjugate gradient based on Jonathan Shewchuk ‘s tutorial . [sent-19, score-0.425]

13 This is a batch algorithm and hence requires multiple passes over any dataset to do something useful. [sent-20, score-0.248]

14 Each step of conjugate gradient requires 2 passes. [sent-21, score-0.321]

15 This implementation has two advantages over the basic approach: it implicitly computes a Hessian in O(n) time where n is the number of features and it operates out of core, hence making it applicable to datasets that don’t conveniently fit in RAM. [sent-26, score-0.483]

16 The learned predictor is compatible with the default, although you’ll notice that a factor of 8 more RAM is required when learning. [sent-27, score-0.628]

17 This code is still experimental and likely to change over the next week. [sent-29, score-0.2]

18 The code appears to be substantially faster than Matt’s earlier python implementation making this probably the most efficient LDA anywhere. [sent-31, score-0.554]

19 LDA is still much slower than online linear learning as it is quite computationally heavy in comparison—perhaps a good candidate for GPU optimization. [sent-32, score-0.337]

20 In addition, Ariel added a test suite, Shravan helped with ngrams, and there are several other minor new features and bug fixes including a very subtle one caught by Vaclav . [sent-36, score-0.295]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nikos', 0.307), ('default', 0.243), ('compatible', 0.184), ('implemented', 0.152), ('matt', 0.142), ('lccc', 0.142), ('conjugate', 0.134), ('update', 0.132), ('features', 0.13), ('normalization', 0.128), ('still', 0.126), ('faster', 0.124), ('changes', 0.123), ('lda', 0.123), ('computationally', 0.114), ('gradient', 0.109), ('required', 0.105), ('learned', 0.103), ('implementation', 0.099), ('online', 0.097), ('earlier', 0.097), ('daniel', 0.095), ('helped', 0.094), ('hence', 0.092), ('tutorial', 0.088), ('factor', 0.084), ('making', 0.083), ('addition', 0.08), ('although', 0.079), ('datasets', 0.079), ('dataset', 0.078), ('requires', 0.078), ('supervised', 0.077), ('classic', 0.077), ('enabled', 0.077), ('hack', 0.077), ('allocation', 0.077), ('python', 0.077), ('regressors', 0.077), ('backprop', 0.077), ('dominating', 0.077), ('shravan', 0.077), ('vaclav', 0.077), ('code', 0.074), ('predictor', 0.073), ('suite', 0.071), ('usable', 0.071), ('fixes', 0.071), ('documentation', 0.071), ('slowdown', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999958 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa

2 0.18088026 441 hunch net-2011-08-15-Vowpal Wabbit 6.0

Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz

3 0.17459691 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

Introduction: At NIPS, Andrew Ng asked me what should be in a large scale learning class. After some discussion with him and Nando and mulling it over a bit, these are the topics that I think should be covered. There are many different kinds of scaling. Scaling in examples This is the most basic kind of scaling. Online Gradient Descent This is an old algorithm—I’m not sure if anyone can be credited with it in particular. Perhaps the Perceptron is a good precursor, but substantial improvements come from the notion of a loss function of which squared loss , logistic loss , Hinge Loss, and Quantile Loss are all worth covering. It’s important to cover the semantics of these loss functions as well. Vowpal Wabbit is a reasonably fast codebase implementing these. Second Order Gradient Descent methods For some problems, methods taking into account second derivative information can be more effective. I’ve seen preconditioned conjugate gradient work well, for which Jonath

4 0.15692109 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti

5 0.15341425 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy

Introduction: I’m releasing version 4.0 ( tarball ) of Vowpal Wabbit . The biggest change (by far) in this release is experimental support for cluster parallelism, with notable help from Daniel Hsu . I also took advantage of the major version number to introduce some incompatible changes, including switching to murmurhash 2 , and other alterations to cachefiles. You’ll need to delete and regenerate them. In addition, the precise specification for a “tag” (i.e. string that can be used to identify an example) changed—you can’t have a space between the tag and the ‘|’ at the beginning of the feature namespace. And, of course, we made it faster. For the future, I put up my todo list outlining the major future improvements I want to see in the code. I’m planning to discuss the current mechanism and results of the cluster parallel implementation at the large scale machine learning workshop at NIPS later this week. Several people have asked me to do a tutorial/walkthrough of VW, wh

6 0.15217137 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

7 0.15029651 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

8 0.14936228 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

9 0.14822035 267 hunch net-2007-10-17-Online as the new adjective

10 0.13373739 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

11 0.13055797 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

12 0.1281524 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

13 0.12645389 436 hunch net-2011-06-22-Ultra LDA

14 0.1222821 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

15 0.12135119 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

16 0.11823484 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

17 0.11210461 332 hunch net-2008-12-23-Use of Learning Theory

18 0.11185643 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

19 0.11076246 346 hunch net-2009-03-18-Parallel ML primitives

20 0.10636838 111 hunch net-2005-09-12-Fast Gradient Descent


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.242), (1, 0.062), (2, -0.097), (3, -0.03), (4, 0.134), (5, 0.054), (6, -0.181), (7, -0.104), (8, -0.127), (9, 0.177), (10, -0.069), (11, -0.041), (12, 0.067), (13, -0.006), (14, -0.019), (15, -0.078), (16, -0.05), (17, -0.009), (18, 0.002), (19, -0.059), (20, -0.034), (21, 0.104), (22, 0.025), (23, -0.014), (24, -0.05), (25, -0.018), (26, -0.016), (27, -0.013), (28, -0.011), (29, -0.038), (30, 0.041), (31, 0.003), (32, 0.007), (33, -0.064), (34, -0.052), (35, 0.061), (36, 0.015), (37, -0.037), (38, -0.085), (39, 0.048), (40, -0.044), (41, 0.03), (42, -0.043), (43, -0.017), (44, 0.06), (45, 0.056), (46, -0.044), (47, -0.01), (48, -0.05), (49, -0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96305102 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa

2 0.834548 441 hunch net-2011-08-15-Vowpal Wabbit 6.0

Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz

3 0.81647539 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

Introduction: I just made version 6.1 of Vowpal Wabbit . Relative to 6.0 , there are few new features, but many refinements. The cluster parallel learning code better supports multiple simultaneous runs, and other forms of parallelism have been mostly removed. This incidentally significantly simplifies the learning core. The online learning algorithms are more general, with support for l 1 (via a truncated gradient variant) and l 2 regularization, and a generalized form of variable metric learning. There is a solid persistent server mode which can train online, as well as serve answers to many simultaneous queries, either in text or binary. This should be a very good release if you are just getting started, as we’ve made it compile more automatically out of the box, have several new examples and updated documentation. As per tradition , we’re planning to do a tutorial at NIPS during the break at the parallel learning workshop at 2pm Spanish time Friday. I’ll cover the

4 0.79004729 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve

5 0.76661563 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti

6 0.73716712 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy

7 0.73062468 436 hunch net-2011-06-22-Ultra LDA

8 0.71649146 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

9 0.65788597 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

10 0.65133572 346 hunch net-2009-03-18-Parallel ML primitives

11 0.64321899 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

12 0.63634998 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

13 0.61684251 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

14 0.6137895 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

15 0.60528839 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

16 0.5593667 490 hunch net-2013-11-09-Graduates and Postdocs

17 0.53558367 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

18 0.52555001 267 hunch net-2007-10-17-Online as the new adjective

19 0.51857495 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

20 0.51510048 286 hunch net-2008-01-25-Turing’s Club for Machine Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.024), (10, 0.036), (17, 0.025), (19, 0.271), (27, 0.158), (38, 0.032), (51, 0.016), (53, 0.053), (55, 0.105), (77, 0.021), (93, 0.018), (94, 0.134), (95, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93886608 491 hunch net-2013-11-21-Ben Taskar is gone

Introduction: I was not as personally close to Ben as Sam , but the level of tragedy is similar and I can’t help but be greatly saddened by the loss. Various news stories have coverage, but the synopsis is that he had a heart attack on Sunday and is survived by his wife Anat and daughter Aviv. There is discussion of creating a memorial fund for them, which I hope comes to fruition, and plan to contribute to. I will remember Ben as someone who thought carefully and comprehensively about new ways to do things, then fought hard and successfully for what he believed in. It is an ideal we strive for, that Ben accomplished. Edit: donations go here , and more information is here .

same-blog 2 0.84104002 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa

3 0.8279174 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

Introduction: Should results of experiments on proprietary datasets be in the academic research literature? The arguments I can imagine in the “against” column are: Experiments are not repeatable. Repeatability in experiments is essential to science because it allows others to compare new methods with old and discover which is better. It’s unfair. Academics who don’t have insider access to proprietary data are at a substantial disadvantage when competing with others who do. I’m unsympathetic to argument (2). To me, it looks like their are simply some resource constraints, and these should not prevent research progress. For example, we wouldn’t prevent publishing about particle accelerator experiments by physicists at CERN because physicists at CMU couldn’t run their own experiments. Argument (1) seems like a real issue. The argument for is: Yes, they are another form of evidence that an algorithm is good. The degree to which they are evidence is less than for public

4 0.65504885 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

Introduction: This is a very difficult post to write, because it is about a perenially touchy subject. Nevertheless, it is an important one which needs to be thought about carefully. There are a few things which should be understood: The system is changing and responsive. We-the-authors are we-the-reviewers, we-the-PC, and even we-the-NIPS-board. NIPS has implemented ‘secondary program chairs’, ‘author response’, and ‘double blind reviewing’ in the last few years to help with the decision process, and more changes may happen in the future. Agreement creates a perception of correctness. When any PC meets and makes a group decision about a paper, there is a strong tendency for the reinforcement inherent in a group decision to create the perception of correctness. For the many people who have been on the NIPS PC it’s reasonable to entertain a healthy skepticism in the face of this reinforcing certainty. This post is about structural problems. What problems arise because of the structure

5 0.64352655 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of

6 0.63973677 423 hunch net-2011-02-02-User preferences for search engines

7 0.63507444 253 hunch net-2007-07-06-Idempotent-capable Predictors

8 0.62909842 95 hunch net-2005-07-14-What Learning Theory might do

9 0.62877196 366 hunch net-2009-08-03-Carbon in Computer Science Research

10 0.6284132 96 hunch net-2005-07-21-Six Months

11 0.62835079 40 hunch net-2005-03-13-Avoiding Bad Reviewing

12 0.6259535 437 hunch net-2011-07-10-ICML 2011 and the future

13 0.62566042 229 hunch net-2007-01-26-Parallel Machine Learning Problems

14 0.62192625 75 hunch net-2005-05-28-Running A Machine Learning Summer School

15 0.6211766 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

16 0.620727 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

17 0.61956578 276 hunch net-2007-12-10-Learning Track of International Planning Competition

18 0.61910141 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

19 0.61890107 454 hunch net-2012-01-30-ICML Posters and Scope

20 0.6188972 297 hunch net-2008-04-22-Taking the next step