hunch_net hunch_net-2007 hunch_net-2007-229 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Parallel machine learning is a subject rarely addressed at machine learning conferences. Nevertheless, it seems likely to increase in importance because: Data set sizes appear to be growing substantially faster than computation. Essentially, this happens because more and more sensors of various sorts are being hooked up to the internet. Serial speedups of processors seem are relatively stalled. The new trend is to make processors more powerful by making them multicore . Both AMD and Intel are making dual core designs standard, with plans for more parallelism in the future. IBM’s Cell processor has (essentially) 9 cores. Modern graphics chips can have an order of magnitude more separate execution units. The meaning of ‘core’ varies a bit from processor to processor, but the overall trend seems quite clear. So, how do we parallelize machine learning algorithms? The simplest and most common technique is to simply run the same learning algorithm with di
sentIndex sentText sentNum sentScore
1 Serial speedups of processors seem are relatively stalled. [sent-4, score-0.247]
2 The new trend is to make processors more powerful by making them multicore . [sent-5, score-0.532]
3 Both AMD and Intel are making dual core designs standard, with plans for more parallelism in the future. [sent-6, score-0.535]
4 IBM’s Cell processor has (essentially) 9 cores. [sent-7, score-0.257]
5 Modern graphics chips can have an order of magnitude more separate execution units. [sent-8, score-0.275]
6 The meaning of ‘core’ varies a bit from processor to processor, but the overall trend seems quite clear. [sent-9, score-0.414]
7 The simplest and most common technique is to simply run the same learning algorithm with different parameters on different processors. [sent-11, score-0.447]
8 This approach doesn’t speed up any individual run of a learning algorithm. [sent-13, score-0.325]
9 The next simplest technique is to decompose a learning algorithm into an adaptive sequence of statistical queries and parallelize the queries over the sample. [sent-14, score-1.138]
10 This paper (updated from the term paper according to a comment) points out that statistical integration can be implemented via MapReduce which Google popularized (the open source version is Hadoop ). [sent-15, score-0.465]
11 The general technique of parallel statistical integration is already used by many people including IBM’s Parallel Machine Learning Toolbox . [sent-16, score-0.76]
12 The disadvantage of this approach is that it is particularly good at speeding up slow algorithms. [sent-17, score-0.36]
13 One example of a fast sequential algorithm is perceptron . [sent-18, score-0.23]
14 The perceptron works on a per example basis making individual updates extremely fast. [sent-19, score-0.479]
15 It is explicitly not a statistical query algorithm. [sent-20, score-0.228]
16 The most difficult method for speeding up an algorithm is fine-grained structural parallelism. [sent-21, score-0.446]
17 The brain works in this way: each individual neuron operates on it’s own. [sent-22, score-0.415]
18 The reason why this is difficult is that the programming is particularly tricky—you must carefully optimize to avoid latency in network communication. [sent-23, score-0.184]
19 The full complexity of parallel programming is exposed. [sent-24, score-0.536]
20 A basic question is: are there other approaches to speeding up learning algorithms which don’t incur the full complexity of option (3)? [sent-25, score-0.562]
wordName wordTfidf (topN-words)
[('speeding', 0.272), ('processor', 0.257), ('parallel', 0.236), ('statistical', 0.228), ('processors', 0.171), ('technique', 0.157), ('trend', 0.157), ('parallelize', 0.157), ('individual', 0.155), ('queries', 0.151), ('perceptron', 0.142), ('integration', 0.139), ('ibm', 0.129), ('simplest', 0.12), ('full', 0.116), ('programming', 0.108), ('making', 0.106), ('incur', 0.098), ('serial', 0.098), ('chips', 0.098), ('openmosix', 0.098), ('designs', 0.098), ('multicore', 0.098), ('neuron', 0.098), ('popularized', 0.098), ('sizes', 0.098), ('toolbox', 0.098), ('core', 0.093), ('amd', 0.091), ('sensors', 0.091), ('cell', 0.091), ('graphics', 0.091), ('approach', 0.088), ('algorithm', 0.088), ('intel', 0.086), ('operates', 0.086), ('decompose', 0.086), ('management', 0.086), ('execution', 0.086), ('structural', 0.086), ('run', 0.082), ('dual', 0.082), ('mapreduce', 0.082), ('plans', 0.078), ('parallelism', 0.078), ('works', 0.076), ('speedups', 0.076), ('latency', 0.076), ('complexity', 0.076), ('essentially', 0.074)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 229 hunch net-2007-01-26-Parallel Machine Learning Problems
Introduction: Parallel machine learning is a subject rarely addressed at machine learning conferences. Nevertheless, it seems likely to increase in importance because: Data set sizes appear to be growing substantially faster than computation. Essentially, this happens because more and more sensors of various sorts are being hooked up to the internet. Serial speedups of processors seem are relatively stalled. The new trend is to make processors more powerful by making them multicore . Both AMD and Intel are making dual core designs standard, with plans for more parallelism in the future. IBM’s Cell processor has (essentially) 9 cores. Modern graphics chips can have an order of magnitude more separate execution units. The meaning of ‘core’ varies a bit from processor to processor, but the overall trend seems quite clear. So, how do we parallelize machine learning algorithms? The simplest and most common technique is to simply run the same learning algorithm with di
2 0.19470757 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
Introduction: Urs Hoelzle from Google gave an invited presentation at NIPS . In the presentation, he strongly advocates interacting with data in a particular scalable manner which is something like the following: Make a cluster of machines. Build a unified filesystem. (Google uses GFS, but NFS or other approaches work reasonably well for smaller clusters.) Interact with data via MapReduce . Creating a cluster of machines is, by this point, relatively straightforward. Unified filesystems are a little bit tricky—GFS is capable by design of essentially unlimited speed throughput to disk. NFS can bottleneck because all of the data has to move through one machine. Nevertheless, this may not be a limiting factor for smaller clusters. MapReduce is a programming paradigm. Essentially, it is a combination of a data element transform (map) and an agreggator/selector (reduce). These operations are highly parallelizable and the claim is that they support the forms of data interacti
3 0.18902324 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
4 0.17916211 128 hunch net-2005-11-05-The design of a computing cluster
Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,
5 0.1595488 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
6 0.14485142 404 hunch net-2010-08-20-The Workshop on Cores, Clusters, and Clouds
7 0.14409606 120 hunch net-2005-10-10-Predictive Search is Coming
8 0.14185317 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
9 0.12437398 54 hunch net-2005-04-08-Fast SVMs
10 0.10899146 442 hunch net-2011-08-20-The Large Scale Learning Survey Tutorial
11 0.10589326 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
12 0.10222254 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
13 0.10009576 287 hunch net-2008-01-28-Sufficient Computation
14 0.098633029 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
15 0.098170049 347 hunch net-2009-03-26-Machine Learning is too easy
16 0.096523419 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
17 0.094350427 215 hunch net-2006-10-22-Exemplar programming
18 0.090221554 366 hunch net-2009-08-03-Carbon in Computer Science Research
19 0.089754328 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
20 0.08508037 370 hunch net-2009-09-18-Necessary and Sufficient Research
topicId topicWeight
[(0, 0.199), (1, 0.049), (2, -0.08), (3, 0.022), (4, 0.097), (5, 0.033), (6, -0.115), (7, -0.012), (8, -0.063), (9, 0.092), (10, -0.12), (11, -0.046), (12, 0.029), (13, -0.017), (14, -0.001), (15, -0.114), (16, 0.023), (17, 0.004), (18, -0.059), (19, -0.111), (20, 0.04), (21, 0.053), (22, -0.066), (23, 0.098), (24, -0.047), (25, 0.055), (26, 0.078), (27, 0.017), (28, 0.067), (29, -0.023), (30, 0.01), (31, 0.004), (32, 0.042), (33, 0.096), (34, 0.08), (35, -0.081), (36, 0.085), (37, 0.086), (38, -0.013), (39, -0.05), (40, 0.016), (41, -0.172), (42, 0.083), (43, 0.16), (44, -0.063), (45, -0.055), (46, 0.008), (47, 0.065), (48, 0.046), (49, 0.1)]
simIndex simValue blogId blogTitle
same-blog 1 0.93606424 229 hunch net-2007-01-26-Parallel Machine Learning Problems
Introduction: Parallel machine learning is a subject rarely addressed at machine learning conferences. Nevertheless, it seems likely to increase in importance because: Data set sizes appear to be growing substantially faster than computation. Essentially, this happens because more and more sensors of various sorts are being hooked up to the internet. Serial speedups of processors seem are relatively stalled. The new trend is to make processors more powerful by making them multicore . Both AMD and Intel are making dual core designs standard, with plans for more parallelism in the future. IBM’s Cell processor has (essentially) 9 cores. Modern graphics chips can have an order of magnitude more separate execution units. The meaning of ‘core’ varies a bit from processor to processor, but the overall trend seems quite clear. So, how do we parallelize machine learning algorithms? The simplest and most common technique is to simply run the same learning algorithm with di
2 0.73447269 128 hunch net-2005-11-05-The design of a computing cluster
Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,
3 0.70349318 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
Introduction: Urs Hoelzle from Google gave an invited presentation at NIPS . In the presentation, he strongly advocates interacting with data in a particular scalable manner which is something like the following: Make a cluster of machines. Build a unified filesystem. (Google uses GFS, but NFS or other approaches work reasonably well for smaller clusters.) Interact with data via MapReduce . Creating a cluster of machines is, by this point, relatively straightforward. Unified filesystems are a little bit tricky—GFS is capable by design of essentially unlimited speed throughput to disk. NFS can bottleneck because all of the data has to move through one machine. Nevertheless, this may not be a limiting factor for smaller clusters. MapReduce is a programming paradigm. Essentially, it is a combination of a data element transform (map) and an agreggator/selector (reduce). These operations are highly parallelizable and the claim is that they support the forms of data interacti
4 0.6862269 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
5 0.67581463 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
6 0.63432586 54 hunch net-2005-04-08-Fast SVMs
7 0.59581435 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
8 0.57780665 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
9 0.56885791 442 hunch net-2011-08-20-The Large Scale Learning Survey Tutorial
10 0.56778407 366 hunch net-2009-08-03-Carbon in Computer Science Research
11 0.54140979 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
12 0.52859694 404 hunch net-2010-08-20-The Workshop on Cores, Clusters, and Clouds
13 0.52628624 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
14 0.50571102 215 hunch net-2006-10-22-Exemplar programming
15 0.49753347 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
16 0.467437 381 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy
17 0.45601401 250 hunch net-2007-06-23-Machine Learning Jobs are Growing on Trees
18 0.44266951 53 hunch net-2005-04-06-Structured Regret Minimization
19 0.42849657 287 hunch net-2008-01-28-Sufficient Computation
20 0.4281579 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
topicId topicWeight
[(0, 0.012), (10, 0.019), (27, 0.19), (38, 0.034), (48, 0.017), (53, 0.047), (55, 0.074), (56, 0.014), (71, 0.033), (76, 0.278), (94, 0.204)]
simIndex simValue blogId blogTitle
same-blog 1 0.8300482 229 hunch net-2007-01-26-Parallel Machine Learning Problems
Introduction: Parallel machine learning is a subject rarely addressed at machine learning conferences. Nevertheless, it seems likely to increase in importance because: Data set sizes appear to be growing substantially faster than computation. Essentially, this happens because more and more sensors of various sorts are being hooked up to the internet. Serial speedups of processors seem are relatively stalled. The new trend is to make processors more powerful by making them multicore . Both AMD and Intel are making dual core designs standard, with plans for more parallelism in the future. IBM’s Cell processor has (essentially) 9 cores. Modern graphics chips can have an order of magnitude more separate execution units. The meaning of ‘core’ varies a bit from processor to processor, but the overall trend seems quite clear. So, how do we parallelize machine learning algorithms? The simplest and most common technique is to simply run the same learning algorithm with di
2 0.71838772 276 hunch net-2007-12-10-Learning Track of International Planning Competition
Introduction: The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more information you can go to the competition web-site . The competitions are typically organized around a number of planning domains that can vary from year to year, where a planning domain is simply a class of problems that share a common action schema—e.g. Blocksworld is a well-known planning domain that contains a problem instance each possible initial tower configuration and goal configuration. Some other domains have included Logistics, Airport, Freecell, PipesWorld, and many others . For each domain the competition includes a number of problems (say 40-50) and the planners are run on each problem with a time limit for each problem (around 30 minutes). The problems are hard enough that many problems are not solved within th
3 0.71582872 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
Introduction: This is a very difficult post to write, because it is about a perenially touchy subject. Nevertheless, it is an important one which needs to be thought about carefully. There are a few things which should be understood: The system is changing and responsive. We-the-authors are we-the-reviewers, we-the-PC, and even we-the-NIPS-board. NIPS has implemented ‘secondary program chairs’, ‘author response’, and ‘double blind reviewing’ in the last few years to help with the decision process, and more changes may happen in the future. Agreement creates a perception of correctness. When any PC meets and makes a group decision about a paper, there is a strong tendency for the reinforcement inherent in a group decision to create the perception of correctness. For the many people who have been on the NIPS PC it’s reasonable to entertain a healthy skepticism in the face of this reinforcing certainty. This post is about structural problems. What problems arise because of the structure
4 0.71389329 120 hunch net-2005-10-10-Predictive Search is Coming
Introduction: “Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon
5 0.69589424 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
6 0.6877225 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
7 0.68603814 346 hunch net-2009-03-18-Parallel ML primitives
8 0.68477035 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
9 0.68388158 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
10 0.68173134 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
11 0.67748463 35 hunch net-2005-03-04-The Big O and Constants in Learning
12 0.6674993 253 hunch net-2007-07-06-Idempotent-capable Predictors
13 0.66155076 95 hunch net-2005-07-14-What Learning Theory might do
14 0.65871298 147 hunch net-2006-01-08-Debugging Your Brain
15 0.65870136 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
16 0.65799737 42 hunch net-2005-03-17-Going all the Way, Sometimes
17 0.65784687 237 hunch net-2007-04-02-Contextual Scaling
18 0.65600699 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
19 0.65569854 306 hunch net-2008-07-02-Proprietary Data in Academic Research?
20 0.65333611 43 hunch net-2005-03-18-Binomial Weighting