hunch_net hunch_net-2006 hunch_net-2006-210 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I
sentIndex sentText sentNum sentScore
1 There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. [sent-2, score-0.591]
2 Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. [sent-5, score-1.11]
3 Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. [sent-7, score-0.592]
4 Garbage collection often implies that considerably more memory is used than is necessary. [sent-8, score-0.386]
5 In some circumstances it results in no slowdown while in others it can cause a 4-order of magnitude slowdown. [sent-10, score-0.288]
6 Some of these languages are interpreted rather than executed. [sent-12, score-0.513]
7 As a rule of thumb, interpreted languages are an order of magnitude slower than an executed languages. [sent-13, score-0.68]
8 Even when these languages are compiled, there are often issues with how well they are compiled. [sent-14, score-0.653]
9 Programming Ease Ease of use of a language is very subjective because it is always easiest to use the language you are most familiar with. [sent-16, score-0.617]
10 Syntax Syntax is often overlooked, but it can make a huge difference in the ease of both learning to program and using the language. [sent-18, score-0.36]
11 Library Support Languages vary dramatically in terms of library support, and having the right linear algebre/graphics/IO library can make a task dramatically easier. [sent-22, score-0.52]
12 One caveat here is that when you make a speed optimization pass, you often have to avoid these primitives. [sent-29, score-0.277]
13 Scalability Scalability is where otherwise higher level languages often break down. [sent-33, score-0.795]
14 A simple example of this is a language with file I/O built in that fails to perform correctly when the file has size 2 31 or 2 32 . [sent-34, score-0.458]
15 I am particularly familiar with Ocaml which has the following scalability issues: List operations often end up consuming the stack and segfaulting. [sent-35, score-0.444]
16 The Unison crew were annoyed enough by this that they created their own “safelist” library with all the same interfaces as the list type. [sent-36, score-0.324]
17 However, having big arrays, arrays, and strings, often becomes annoying because they have different interfaces for objects which are semantically the same. [sent-39, score-0.364]
18 At the other extreme, you can use a language which many people are familiar with such as C or Java. [sent-42, score-0.352]
19 The higher level languages often can’t execute fast and the lower level ones which can are often quite clumsy. [sent-53, score-1.167]
20 The approach I’ve taken is to first implement in a higher level language (my choice was ocaml) to test ideas and then reimplement in a lower level language (C or C++) for speed where the ideas work out. [sent-54, score-1.094]
wordName wordTfidf (topN-words)
[('languages', 0.414), ('garbage', 0.241), ('language', 0.208), ('syntax', 0.198), ('implemented', 0.177), ('library', 0.171), ('arrays', 0.161), ('ocaml', 0.161), ('often', 0.154), ('memory', 0.152), ('ease', 0.152), ('scalability', 0.149), ('level', 0.135), ('familiarity', 0.134), ('speed', 0.123), ('java', 0.121), ('perl', 0.121), ('programming', 0.118), ('objects', 0.111), ('magnitude', 0.106), ('interfaces', 0.099), ('slowdown', 0.099), ('interpreted', 0.099), ('higher', 0.092), ('dramatically', 0.089), ('file', 0.089), ('familiar', 0.087), ('issues', 0.085), ('cause', 0.083), ('lower', 0.083), ('collection', 0.08), ('support', 0.074), ('built', 0.072), ('algorithmic', 0.067), ('growing', 0.063), ('rule', 0.061), ('associated', 0.059), ('extreme', 0.058), ('use', 0.057), ('ideas', 0.055), ('huge', 0.054), ('consuming', 0.054), ('polymorphism', 0.054), ('eak', 0.054), ('crew', 0.054), ('compiling', 0.054), ('concise', 0.054), ('desktop', 0.054), ('matlab', 0.054), ('python', 0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999952 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I
2 0.36095604 84 hunch net-2005-06-22-Languages of Learning
Introduction: A language is a set of primitives which can be combined to succesfully create complex objects. Languages arise in all sorts of situations: mechanical construction, martial arts, communication, etc… Languages appear to be the key to succesfully creating complex objects—it is difficult to come up with any convincing example of a complex object which is not built using some language. Since languages are so crucial to success, it is interesting to organize various machine learning research programs by language. The most common language in machine learning are languages for representing the solution to machine learning. This includes: Bayes Nets and Graphical Models A language for representing probability distributions. The key concept supporting modularity is conditional independence. Michael Kearns has been working on extending this to game theory. Kernelized Linear Classifiers A language for representing linear separators, possibly in a large space. The key form of
3 0.21994746 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
Introduction: Machine learning is often computationally bounded which implies that the ability to write fast code becomes important if you ever want to implement a machine learning algorithm. Basic tactical optimizations are covered well elsewhere , but I haven’t seen a reasonable guide to higher level optimizations, which are the most important in my experience. Here are some of the higher level optimizations I’ve often found useful. Algorithmic Improvement First . This is Hard, but it is the most important consideration, and typically yields the most benefits. Good optimizations here are publishable. In the context of machine learning, you should be familiar with the arguments for online vs. batch learning. Choice of Language . There are many arguments about the choice of language . Sometimes you don’t have a choice when interfacing with other people. Personally, I favor C/C++ when I want to write fast code. This (admittedly) makes me a slower programmer than when using higher lev
4 0.14791965 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab
5 0.13817945 215 hunch net-2006-10-22-Exemplar programming
Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw
6 0.11837821 35 hunch net-2005-03-04-The Big O and Constants in Learning
7 0.095744044 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
8 0.09411446 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
9 0.093684785 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
10 0.093467861 235 hunch net-2007-03-03-All Models of Learning have Flaws
11 0.091247901 120 hunch net-2005-10-10-Predictive Search is Coming
12 0.090709478 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
13 0.089924589 454 hunch net-2012-01-30-ICML Posters and Scope
14 0.089754328 229 hunch net-2007-01-26-Parallel Machine Learning Problems
15 0.083782181 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
16 0.081573322 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
17 0.08030156 237 hunch net-2007-04-02-Contextual Scaling
18 0.079331696 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
19 0.078256987 70 hunch net-2005-05-12-Math on the Web
20 0.078116529 128 hunch net-2005-11-05-The design of a computing cluster
topicId topicWeight
[(0, 0.195), (1, 0.041), (2, -0.065), (3, 0.053), (4, 0.004), (5, -0.002), (6, -0.023), (7, 0.018), (8, -0.011), (9, -0.008), (10, -0.151), (11, -0.078), (12, -0.018), (13, -0.002), (14, 0.007), (15, -0.09), (16, 0.041), (17, 0.011), (18, 0.011), (19, -0.102), (20, 0.078), (21, -0.006), (22, -0.016), (23, -0.002), (24, -0.047), (25, 0.001), (26, -0.012), (27, -0.008), (28, -0.005), (29, 0.08), (30, -0.119), (31, 0.064), (32, 0.197), (33, 0.205), (34, 0.005), (35, 0.04), (36, -0.011), (37, -0.022), (38, -0.115), (39, -0.122), (40, 0.046), (41, -0.053), (42, 0.088), (43, -0.133), (44, -0.093), (45, -0.051), (46, 0.017), (47, -0.039), (48, 0.039), (49, 0.163)]
simIndex simValue blogId blogTitle
same-blog 1 0.96043801 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I
2 0.86524481 84 hunch net-2005-06-22-Languages of Learning
Introduction: A language is a set of primitives which can be combined to succesfully create complex objects. Languages arise in all sorts of situations: mechanical construction, martial arts, communication, etc… Languages appear to be the key to succesfully creating complex objects—it is difficult to come up with any convincing example of a complex object which is not built using some language. Since languages are so crucial to success, it is interesting to organize various machine learning research programs by language. The most common language in machine learning are languages for representing the solution to machine learning. This includes: Bayes Nets and Graphical Models A language for representing probability distributions. The key concept supporting modularity is conditional independence. Michael Kearns has been working on extending this to game theory. Kernelized Linear Classifiers A language for representing linear separators, possibly in a large space. The key form of
3 0.69940764 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
Introduction: Machine learning is often computationally bounded which implies that the ability to write fast code becomes important if you ever want to implement a machine learning algorithm. Basic tactical optimizations are covered well elsewhere , but I haven’t seen a reasonable guide to higher level optimizations, which are the most important in my experience. Here are some of the higher level optimizations I’ve often found useful. Algorithmic Improvement First . This is Hard, but it is the most important consideration, and typically yields the most benefits. Good optimizations here are publishable. In the context of machine learning, you should be familiar with the arguments for online vs. batch learning. Choice of Language . There are many arguments about the choice of language . Sometimes you don’t have a choice when interfacing with other people. Personally, I favor C/C++ when I want to write fast code. This (admittedly) makes me a slower programmer than when using higher lev
4 0.63459444 128 hunch net-2005-11-05-The design of a computing cluster
Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,
5 0.60547161 215 hunch net-2006-10-22-Exemplar programming
Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw
6 0.55080074 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
7 0.52921695 171 hunch net-2006-04-09-Progress in Machine Translation
8 0.48866585 229 hunch net-2007-01-26-Parallel Machine Learning Problems
9 0.46840522 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
10 0.4423202 162 hunch net-2006-03-09-Use of Notation
11 0.43799725 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
12 0.43404999 147 hunch net-2006-01-08-Debugging Your Brain
13 0.43297195 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
14 0.42746016 70 hunch net-2005-05-12-Math on the Web
15 0.42496866 122 hunch net-2005-10-13-Site tweak
16 0.42439115 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
17 0.4231829 37 hunch net-2005-03-08-Fast Physics for Learning
18 0.40782481 250 hunch net-2007-06-23-Machine Learning Jobs are Growing on Trees
19 0.40386292 366 hunch net-2009-08-03-Carbon in Computer Science Research
20 0.40383732 228 hunch net-2007-01-15-The Machine Learning Department
topicId topicWeight
[(0, 0.031), (1, 0.024), (10, 0.018), (27, 0.145), (38, 0.064), (49, 0.023), (51, 0.024), (53, 0.046), (55, 0.058), (64, 0.329), (94, 0.119), (95, 0.031)]
simIndex simValue blogId blogTitle
1 0.9215579 442 hunch net-2011-08-20-The Large Scale Learning Survey Tutorial
Introduction: Ron Bekkerman initiated an effort to create an edited book on parallel machine learning that Misha and I have been helping with. The breadth of efforts to parallelize machine learning surprised me: I was only aware of a small fraction initially. This put us in a unique position, with knowledge of a wide array of different efforts, so it is natural to put together a survey tutorial on the subject of parallel learning for KDD , tomorrow. This tutorial is not limited to the book itself however, as several interesting new algorithms have come out since we started inviting chapters. This tutorial should interest anyone trying to use machine learning on significant quantities of data, anyone interested in developing algorithms for such, and of course who has bragging rights to the fastest learning algorithm on planet earth (Also note the Modeling with Hadoop tutorial just before ours which deals with one way of trying to speed up learning algorithms. We have almost no
2 0.9057045 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition
Introduction: Francisco Pereira points out a fun Prediction Competition . Francisco says: DARPA is sponsoring a competition to analyze data from an unusual functional Magnetic Resonance Imaging experiment. Subjects watch videos inside the scanner while fMRI data are acquired. Unbeknownst to these subjects, the videos have been seen by a panel of other subjects that labeled each instant with labels in categories such as representation (are there tools, body parts, motion, sound), location, presence of actors, emotional content, etc. The challenge is to predict all of these different labels on an instant-by-instant basis from the fMRI data. A few reasons why this is particularly interesting: This is beyond the current state of the art, but not inconceivably hard. This is a new type of experiment design current analysis methods cannot deal with. This is an opportunity to work with a heavily examined and preprocessed neuroimaging dataset. DARPA is offering prizes!
same-blog 3 0.86103797 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I
4 0.82965112 291 hunch net-2008-03-07-Spock Challenge Winners
Introduction: The spock challenge for named entity recognition was won by Berno Stein , Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast .
5 0.81775343 420 hunch net-2010-12-26-NIPS 2010
Introduction: I enjoyed attending NIPS this year, with several things interesting me. For the conference itself: Peter Welinder , Steve Branson , Serge Belongie , and Pietro Perona , The Multidimensional Wisdom of Crowds . This paper is about using mechanical turk to get label information, with results superior to a majority vote approach. David McAllester , Tamir Hazan , and Joseph Keshet Direct Loss Minimization for Structured Prediction . This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades . This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested. Alan Fern and Prasad Tadepalli , A Computational Decision Theory for Interactive Assistants . This paper carves out some
6 0.81359428 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
7 0.79261762 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC
8 0.57568777 343 hunch net-2009-02-18-Decision by Vetocracy
9 0.57349735 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
10 0.568932 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
11 0.56497699 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
12 0.54520541 424 hunch net-2011-02-17-What does Watson mean?
13 0.54413038 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.5376426 423 hunch net-2011-02-02-User preferences for search engines
15 0.53655064 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
16 0.53366941 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
17 0.53351969 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
18 0.53041065 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
19 0.53025144 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
20 0.52966809 131 hunch net-2005-11-16-The Everything Ensemble Edge