hunch_net hunch_net-2011 hunch_net-2011-428 knowledge-graph by maker-knowledge-mining

428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

meta infos for this blog

Source: html

Introduction: I just created version 5.1 of vowpal wabbit . This almost entirely a bugfix release, so it’s an easy upgrade from v5.0. In addition: There is now a mailing list , which I and several other developers are subscribed to. The main website has shifted to the wiki on github. This means that anyone with a github account can now edit it. I’m planning to give a tutorial tomorrow on it at eHarmony / the LA machine learning meetup at 10am. Drop by if you’re interested. The status of VW amongst other open source projects has changed. When VW first came out, it was relatively unique amongst existing projects in terms of features. At this point, many other projects have started to appreciate the value of the design choices here. This includes: Mahout , which now has an SGD implementation. Shogun , where Soeren is keen on incorporating features . LibLinear , where they won the KDD best paper award for out-of-core learning . This is expected—any open sourc

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This almost entirely a bugfix release, so it’s an easy upgrade from v5. [sent-3, score-0.144]

2 In addition: There is now a mailing list , which I and several other developers are subscribed to. [sent-5, score-0.246]

3 The main website has shifted to the wiki on github. [sent-6, score-0.396]

4 This means that anyone with a github account can now edit it. [sent-7, score-0.238]

5 I’m planning to give a tutorial tomorrow on it at eHarmony / the LA machine learning meetup at 10am. [sent-8, score-0.313]

6 The status of VW amongst other open source projects has changed. [sent-10, score-0.936]

7 When VW first came out, it was relatively unique amongst existing projects in terms of features. [sent-11, score-0.747]

8 At this point, many other projects have started to appreciate the value of the design choices here. [sent-12, score-0.507]

9 Shogun , where Soeren is keen on incorporating features . [sent-14, score-0.234]

10 LibLinear , where they won the KDD best paper award for out-of-core learning . [sent-15, score-0.1]

11 This is expected—any open source approach which works well should be widely adopted. [sent-16, score-0.336]

12 None of these other projects yet have the full combination of features, so VW still offers something unique. [sent-17, score-0.714]

13 There are also more tricks that I haven’t yet had time to implement, and I look forward to discovering even more. [sent-18, score-0.412]

14 I’m indebted to many people at this point who have helped with this project. [sent-19, score-0.217]

15 I particularly point out Daniel and Nikos , who have spent quite a bit of time over the last few months working on things. [sent-20, score-0.321]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('projects', 0.42), ('vw', 0.292), ('soeren', 0.144), ('upgrade', 0.144), ('mahout', 0.144), ('eharmony', 0.144), ('la', 0.144), ('amongst', 0.142), ('subscribed', 0.134), ('point', 0.129), ('status', 0.126), ('github', 0.126), ('source', 0.126), ('features', 0.122), ('open', 0.122), ('drop', 0.12), ('meetup', 0.12), ('wiki', 0.116), ('offers', 0.116), ('nikos', 0.116), ('mailing', 0.112), ('incorporating', 0.112), ('edit', 0.112), ('tricks', 0.112), ('sgd', 0.112), ('shifted', 0.108), ('tomorrow', 0.108), ('forward', 0.105), ('discovering', 0.102), ('implement', 0.102), ('months', 0.1), ('release', 0.1), ('award', 0.1), ('unique', 0.097), ('re', 0.095), ('yet', 0.093), ('spent', 0.092), ('daniel', 0.09), ('widely', 0.088), ('came', 0.088), ('helped', 0.088), ('includes', 0.088), ('vowpal', 0.088), ('wabbit', 0.088), ('main', 0.087), ('none', 0.087), ('appreciate', 0.087), ('full', 0.085), ('planning', 0.085), ('website', 0.085)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

2 0.17071652 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for

3 0.14936228 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa

4 0.14488411 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti

5 0.1289621 240 hunch net-2007-04-21-Videolectures.net

Introduction: Davor has been working to setup videolectures.net which is the new site for the many lectures mentioned here . (Tragically, they seem to only be available in windows media format.) I went through my own projects and added a few links to the videos. The day when every result is a set of {paper, slides, video} isn’t quite here yet, but it’s within sight. (For many papers, of course, code is a 4th component.)

6 0.12285751 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

7 0.11947295 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

8 0.11714062 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

9 0.10869544 342 hunch net-2009-02-16-KDNuggets

10 0.10577659 441 hunch net-2011-08-15-Vowpal Wabbit 6.0

11 0.10421567 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy

12 0.10412624 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

13 0.10031357 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

14 0.098365016 297 hunch net-2008-04-22-Taking the next step

15 0.095053434 270 hunch net-2007-11-02-The Machine Learning Award goes to …

16 0.092183441 483 hunch net-2013-06-10-The Large Scale Learning class notes

17 0.082214728 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

18 0.079517961 415 hunch net-2010-10-28-NY ML Symposium 2010

19 0.075384766 464 hunch net-2012-05-03-Microsoft Research, New York City

20 0.074304901 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.161), (1, -0.029), (2, -0.119), (3, 0.007), (4, 0.007), (5, 0.047), (6, -0.111), (7, -0.071), (8, -0.141), (9, 0.06), (10, -0.088), (11, -0.044), (12, 0.018), (13, 0.063), (14, 0.024), (15, -0.069), (16, -0.089), (17, 0.019), (18, 0.026), (19, -0.04), (20, 0.048), (21, 0.147), (22, 0.028), (23, -0.098), (24, -0.073), (25, -0.029), (26, -0.099), (27, 0.06), (28, -0.078), (29, 0.034), (30, 0.022), (31, -0.008), (32, 0.041), (33, 0.001), (34, -0.063), (35, -0.06), (36, -0.004), (37, -0.013), (38, 0.079), (39, 0.031), (40, -0.149), (41, 0.025), (42, -0.013), (43, -0.071), (44, 0.003), (45, -0.044), (46, -0.122), (47, -0.033), (48, 0.053), (49, -0.075)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97827065 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

2 0.6567207 441 hunch net-2011-08-15-Vowpal Wabbit 6.0

Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz

3 0.59213781 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

Introduction: Today brings a new release of the Vowpal Wabbit fast online learning software. This time, unlike the previous release, the project itself is going open source, developing via github . For example, the lastest and greatest can be downloaded via: git clone git://github.com/JohnLangford/vowpal_wabbit.git If you aren’t familiar with git , it’s a distributed version control system which supports quick and easy branching, as well as reconciliation. This version of the code is confirmed to compile without complaint on at least some flavors of OSX as well as Linux boxes. As much of the point of this project is pushing the limits of fast and effective machine learning, let me mention a few datapoints from my experience. The program can effectively scale up to batch-style training on sparse terafeature (i.e. 10 12 sparse feature) size datasets. The limiting factor is typically i/o. I started using the the real datasets from the large-scale learning workshop as a conve

4 0.58821177 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy

Introduction: I’m releasing version 4.0 ( tarball ) of Vowpal Wabbit . The biggest change (by far) in this release is experimental support for cluster parallelism, with notable help from Daniel Hsu . I also took advantage of the major version number to introduce some incompatible changes, including switching to murmurhash 2 , and other alterations to cachefiles. You’ll need to delete and regenerate them. In addition, the precise specification for a “tag” (i.e. string that can be used to identify an example) changed—you can’t have a space between the tag and the ‘|’ at the beginning of the feature namespace. And, of course, we made it faster. For the future, I put up my todo list outlining the major future improvements I want to see in the code. I’m planning to discuss the current mechanism and results of the cluster parallel implementation at the large scale machine learning workshop at NIPS later this week. Several people have asked me to do a tutorial/walkthrough of VW, wh

5 0.58229619 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

6 0.57203287 240 hunch net-2007-04-21-Videolectures.net

7 0.56430805 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

8 0.5321461 342 hunch net-2009-02-16-KDNuggets

9 0.52540863 436 hunch net-2011-06-22-Ultra LDA

10 0.51891792 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

11 0.49898371 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

12 0.49132681 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

13 0.4771457 24 hunch net-2005-02-19-Machine learning reading groups

14 0.4733673 479 hunch net-2013-01-31-Remote large scale learning class participation

15 0.46597698 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

16 0.45406815 270 hunch net-2007-11-02-The Machine Learning Award goes to …

17 0.43889365 493 hunch net-2014-02-16-Metacademy: a package manager for knowledge

18 0.4279519 447 hunch net-2011-10-10-ML Symposium and ICML details

19 0.41663828 354 hunch net-2009-05-17-Server Update

20 0.41104007 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(10, 0.054), (27, 0.19), (53, 0.05), (55, 0.096), (58, 0.013), (68, 0.307), (94, 0.128), (95, 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.91724551 48 hunch net-2005-03-29-Academic Mechanism Design

Introduction: From game theory, there is a notion of “mechanism design”: setting up the structure of the world so that participants have some incentive to do sane things (rather than obviously counterproductive things). Application of this principle to academic research may be fruitful. What is misdesigned about academic research? The JMLG guides give many hints. The common nature of bad reviewing also suggests the system isn’t working optimally. There are many ways to experimentally “cheat” in machine learning . Funding Prisoner’s Delimma. Good researchers often write grant proposals for funding rather than doing research. Since the pool of grant money is finite, this means that grant proposals are often rejected, implying that more must be written. This is essentially a “prisoner’s delimma”: anyone not writing grant proposals loses, but the entire process of doing research is slowed by distraction. If everyone wrote 1/2 as many grant proposals, roughly the same distribution

same-blog 2 0.90163112 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

3 0.87456787 110 hunch net-2005-09-10-“Failure” is an option

Introduction: This is about the hard choices that graduate students must make. The cultural definition of success in academic research is to: Produce good research which many other people appreciate. Produce many students who go on to do the same. There are fundamental reasons why this is success in the local culture. Good research appreciated by others means access to jobs. Many students succesful in the same way implies that there are a number of people who think in a similar way and appreciate your work. In order to graduate, a phd student must live in an academic culture for a period of several years. It is common to adopt the culture’s definition of success during this time. It’s also common for many phd students discover they are not suited to an academic research lifestyle. This collision of values and abilities naturally results in depression. The most fundamental advice when this happens is: change something. Pick a new advisor. Pick a new research topic. Or leave th

4 0.84875482 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple

5 0.75309652 345 hunch net-2009-03-08-Prediction Science

Introduction: One view of machine learning is that it’s about how to program computers to predict well. This suggests a broader research program centered around the more pervasive goal of simply predicting well. There are many distinct strands of this broader research program which are only partially unified. Here are the ones that I know of: Learning Theory . Learning theory focuses on several topics related to the dynamics and process of prediction. Convergence bounds like the VC bound give an intellectual foundation to many learning algorithms. Online learning algorithms like Weighted Majority provide an alternate purely game theoretic foundation for learning. Boosting algorithms yield algorithms for purifying prediction abiliity. Reduction algorithms provide means for changing esoteric problems into well known ones. Machine Learning . A great deal of experience has accumulated in practical algorithm design from a mixture of paradigms, including bayesian, biological, opt

6 0.72739959 65 hunch net-2005-05-02-Reviewing techniques for conferences

7 0.6502974 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

8 0.64718717 225 hunch net-2007-01-02-Retrospective

9 0.64573663 207 hunch net-2006-09-12-Incentive Compatible Reviewing

10 0.6353212 411 hunch net-2010-09-21-Regretting the dead

11 0.63421327 464 hunch net-2012-05-03-Microsoft Research, New York City

12 0.62778479 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

13 0.62710869 382 hunch net-2009-12-09-Future Publication Models @ NIPS

14 0.62694848 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

15 0.62451947 366 hunch net-2009-08-03-Carbon in Computer Science Research

16 0.62354606 358 hunch net-2009-06-01-Multitask Poisoning

17 0.6200884 95 hunch net-2005-07-14-What Learning Theory might do

18 0.61758959 449 hunch net-2011-11-26-Giving Thanks

19 0.61757928 343 hunch net-2009-02-18-Decision by Vetocracy

20 0.61662221 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning