hunch_net hunch_net-2012 hunch_net-2012-473 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over
sentIndex sentText sentNum sentScore
1 The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. [sent-2, score-0.444]
2 Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . [sent-3, score-0.177]
3 Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. [sent-6, score-1.233]
4 Note that all learning reductions inherit the ability to do cluster parallel learning. [sent-8, score-0.359]
5 Library interface : VW now has a basic library interface. [sent-9, score-0.368]
6 The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. [sent-10, score-0.488]
7 Windows port : The priority of a windows port jumped way up once we moved to Microsoft . [sent-12, score-1.001]
8 The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode. [sent-13, score-0.304]
9 New update rule : Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant. [sent-14, score-0.894]
10 There are also many other small updates including some contributed utilities that aid the process of applying and using VW. [sent-15, score-0.352]
11 Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented. [sent-16, score-0.626]
wordName wordTfidf (topN-words)
[('vw', 0.326), ('library', 0.29), ('reductions', 0.281), ('port', 0.272), ('custom', 0.224), ('windows', 0.224), ('classification', 0.148), ('rule', 0.138), ('update', 0.138), ('structured', 0.136), ('multiclass', 0.134), ('priority', 0.121), ('documented', 0.121), ('daemon', 0.121), ('dagger', 0.121), ('stephane', 0.121), ('documentation', 0.112), ('moved', 0.112), ('unit', 0.105), ('contributed', 0.105), ('feature', 0.103), ('alternate', 0.101), ('limitation', 0.101), ('items', 0.101), ('functionality', 0.097), ('plans', 0.097), ('supports', 0.097), ('various', 0.089), ('microsoft', 0.088), ('sequential', 0.088), ('updates', 0.088), ('visited', 0.085), ('searn', 0.085), ('done', 0.082), ('applying', 0.081), ('finally', 0.081), ('contextual', 0.081), ('implemented', 0.08), ('automatic', 0.08), ('bandit', 0.08), ('aid', 0.078), ('interface', 0.078), ('cluster', 0.078), ('yet', 0.078), ('direct', 0.077), ('involve', 0.077), ('default', 0.077), ('sensitive', 0.075), ('fixed', 0.075), ('prediction', 0.075)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over
2 0.17245895 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
Introduction: At NIPS I’m giving a tutorial on Learning to Interact . In essence this is about dealing with causality in a contextual bandit framework. Relative to previous tutorials , I’ll be covering several new results that changed my understanding of the nature of the problem. Note that Judea Pearl and Elias Bareinboim have a tutorial on causality . This might appear similar, but is quite different in practice. Pearl and Bareinboim’s tutorial will be about the general concepts while mine will be about total mastery of the simplest nontrivial case, including code. Luckily, they have the right order. I recommend going to both I also just released version 7.4 of Vowpal Wabbit . When I was a frustrated learning theorist, I did not understand why people were not using learning reductions to solve problems. I’ve been slowly discovering why with VW, and addressing the issues. One of the issues is that machine learning itself was not automatic enough, while another is that creatin
3 0.15516295 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
4 0.14762935 103 hunch net-2005-08-18-SVM Adaptability
Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (that’s my own
5 0.1222821 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
Introduction: I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm. The biggest changes are new algorithms: Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes. Nikos also implemented the per-feature learning rates as per these two papers . Often, this works better than the default algorithm. It isn’t the defa
6 0.11714062 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
7 0.10999987 490 hunch net-2013-11-09-Graduates and Postdocs
8 0.10803989 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
9 0.10742257 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
10 0.10477954 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
11 0.10430884 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
12 0.10314463 351 hunch net-2009-05-02-Wielding a New Abstraction
13 0.102119 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
14 0.096120328 240 hunch net-2007-04-21-Videolectures.net
15 0.092151612 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
16 0.090709478 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
17 0.087645724 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
18 0.087614357 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
19 0.086958736 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
20 0.086019672 177 hunch net-2006-05-05-An ICML reject
topicId topicWeight
[(0, 0.158), (1, 0.057), (2, -0.045), (3, -0.048), (4, 0.015), (5, 0.022), (6, -0.002), (7, -0.072), (8, -0.196), (9, 0.065), (10, -0.098), (11, -0.107), (12, -0.005), (13, 0.059), (14, -0.019), (15, -0.101), (16, 0.01), (17, 0.031), (18, 0.062), (19, -0.08), (20, 0.045), (21, -0.028), (22, -0.009), (23, -0.094), (24, -0.086), (25, 0.049), (26, -0.064), (27, 0.055), (28, -0.032), (29, 0.093), (30, 0.109), (31, 0.009), (32, 0.058), (33, -0.077), (34, -0.01), (35, 0.097), (36, -0.037), (37, 0.031), (38, -0.026), (39, 0.051), (40, -0.025), (41, 0.037), (42, -0.116), (43, -0.01), (44, 0.039), (45, 0.036), (46, 0.05), (47, -0.033), (48, -0.056), (49, 0.026)]
simIndex simValue blogId blogTitle
same-blog 1 0.96637702 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over
2 0.76997977 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
Introduction: At NIPS I’m giving a tutorial on Learning to Interact . In essence this is about dealing with causality in a contextual bandit framework. Relative to previous tutorials , I’ll be covering several new results that changed my understanding of the nature of the problem. Note that Judea Pearl and Elias Bareinboim have a tutorial on causality . This might appear similar, but is quite different in practice. Pearl and Bareinboim’s tutorial will be about the general concepts while mine will be about total mastery of the simplest nontrivial case, including code. Luckily, they have the right order. I recommend going to both I also just released version 7.4 of Vowpal Wabbit . When I was a frustrated learning theorist, I did not understand why people were not using learning reductions to solve problems. I’ve been slowly discovering why with VW, and addressing the issues. One of the issues is that machine learning itself was not automatic enough, while another is that creatin
3 0.64629918 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
Introduction: I’m releasing version 4.0 ( tarball ) of Vowpal Wabbit . The biggest change (by far) in this release is experimental support for cluster parallelism, with notable help from Daniel Hsu . I also took advantage of the major version number to introduce some incompatible changes, including switching to murmurhash 2 , and other alterations to cachefiles. You’ll need to delete and regenerate them. In addition, the precise specification for a “tag” (i.e. string that can be used to identify an example) changed—you can’t have a space between the tag and the ‘|’ at the beginning of the feature namespace. And, of course, we made it faster. For the future, I put up my todo list outlining the major future improvements I want to see in the code. I’m planning to discuss the current mechanism and results of the cluster parallel implementation at the large scale machine learning workshop at NIPS later this week. Several people have asked me to do a tutorial/walkthrough of VW, wh
4 0.59710163 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
5 0.59356302 103 hunch net-2005-08-18-SVM Adaptability
Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (that’s my own
6 0.56027114 436 hunch net-2011-06-22-Ultra LDA
7 0.54805005 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
8 0.54517984 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
9 0.51328439 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
10 0.46776482 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1
11 0.45707637 490 hunch net-2013-11-09-Graduates and Postdocs
12 0.45400199 14 hunch net-2005-02-07-The State of the Reduction
13 0.44385493 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
14 0.41927788 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
15 0.41622394 351 hunch net-2009-05-02-Wielding a New Abstraction
16 0.41167602 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
17 0.41099304 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
18 0.40666121 354 hunch net-2009-05-17-Server Update
19 0.39184773 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
20 0.38831493 294 hunch net-2008-04-12-Blog compromised
topicId topicWeight
[(0, 0.447), (10, 0.034), (27, 0.172), (53, 0.013), (55, 0.094), (94, 0.092), (95, 0.041)]
simIndex simValue blogId blogTitle
1 0.87717253 87 hunch net-2005-06-29-Not EM for clustering at COLT
Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec
same-blog 2 0.86869478 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over
3 0.82355517 62 hunch net-2005-04-26-To calibrate or not?
Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b
4 0.80019844 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem
Introduction: …is discussed in this nytimes article . I generally expect such approaches to become more common since computers are getting faster, machine learning is getting better, and data is becoming more plentiful. This is another example where machine learning technology may have a huge economic impact. Some side notes: We-in-research know almost nothing about how these things are done (because it is typically a corporate secret). … but the limited discussion in the article seem naive from a machine learning viewpoint. The learning process used apparently often fails to take into account transaction costs. What little of the approaches is discussed appears modeling based. It seems plausible that more direct prediction methods can yield an edge. One difficulty with stock picking as a research topic is that it is inherently a zero sum game (for every winner, there is a loser). Much of the rest of research is positive sum (basically, everyone wins).
5 0.79744762 215 hunch net-2006-10-22-Exemplar programming
Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw
6 0.72471869 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?
7 0.57507223 133 hunch net-2005-11-28-A question of quantification
8 0.49175096 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
9 0.48201293 5 hunch net-2005-01-26-Watchword: Probability
10 0.45616844 220 hunch net-2006-11-27-Continuizing Solutions
11 0.45323607 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
12 0.45277822 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
13 0.44923747 378 hunch net-2009-11-15-The Other Online Learning
14 0.44891101 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
15 0.44785875 177 hunch net-2006-05-05-An ICML reject
16 0.44753119 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
17 0.44695705 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
18 0.44437376 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
19 0.44310418 360 hunch net-2009-06-15-In Active Learning, the question changes
20 0.44244239 43 hunch net-2005-03-18-Binomial Weighting