hunch_net hunch_net-2005 hunch_net-2005-49 knowledge-graph by maker-knowledge-mining

49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?


meta infos for this blog

Source: html

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probably wildly wrong. [sent-7, score-0.72]

2 ) g’: List of (gamma,delta) -> (List of (alpha,beta) -> (alpha -> beta)) -> (gamma -> delta) Perhaps another is to think of the reduction itself type-theoretically as a combinator that takes something typed like g above and spits out a new classifier trainer. [sent-8, score-0.634]

3 a reduction “lifts” a lower-level function up to operate on some type it wasn’t originally designed for. [sent-11, score-0.456]

4 Some reductions (including, I would argue, some that have been discussed seriously) are “trivial” reductions in the sense that the antecedent never holds. [sent-16, score-0.511]

5 Can we somehow type this kind of thing so we can separate good from bad reductions, where we try to define bad as meaning something like “creates impossible subproblems where it was possible to do otherwise”? [sent-18, score-0.643]

6 RLBench hints at the power that formalizing interfaces for things like reductions can be. [sent-19, score-0.71]

7 Types and Probabilistic Models Arguably the graphical models/Bayesian community has just as grandiose plans as John, but here the reductions are to learning probabilities in the sense “beliefs”. [sent-22, score-0.395]

8 Graphical models form a language that allows us to compose together little subgraphs that express our beliefs about some subsystem. [sent-23, score-0.34]

9 It’s interesting to explore what type theory can tell us about this as well. [sent-25, score-0.369]

10 Allison uses Haskell to type a number of ideas including MML and some classifiers, and can check many things statically. [sent-27, score-0.427]

11 ) It may still make a huge amount of sense to think about things in something like the Haskell type system and then translate them to the capable (but gross) type system of, say C++. [sent-33, score-1.208]

12 Understanding polymorphism and type classes and there relation with Machine Learning may be a real fundamental breakthrough to making ML widely useful. [sent-34, score-0.367]

13 Contributions flowing towards PL theory Right now, when push comes to shove, all good interfaces between systems basically amount to invoking functions or closures. [sent-35, score-0.428]

14 I think of graphical models as tools for expressing future interfaces because they preserve uncertainty across boundaries. [sent-38, score-0.758]

15 My guess is that these interfaces are basically multi-directional (unlike functional interfaces). [sent-42, score-0.497]

16 I can resolve something ambigious about, say a phoneme, by understanding the higher level language model. [sent-45, score-0.368]

17 To get the phoneme parsing right I have to feedback the results from language layer. [sent-46, score-0.309]

18 In this sense, interfaces need to preserve uncertainty and probably pass information both ways. [sent-47, score-0.61]

19 How to start (small) I think a serious effort should be made to explain things like reductions in terms of PL theory– even if it means that, like Allison, you have to do some explaining type systems first. [sent-48, score-0.796]

20 ) We should write our libraries to have super-clean functional interfaces and make them as (parametrically) polymorphic as reasonable. [sent-51, score-0.432]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('interfaces', 0.363), ('type', 0.305), ('allison', 0.221), ('pl', 0.221), ('reductions', 0.206), ('beta', 0.166), ('combinators', 0.166), ('haskell', 0.166), ('phoneme', 0.166), ('alpha', 0.147), ('language', 0.143), ('classifier', 0.113), ('combinator', 0.11), ('types', 0.107), ('john', 0.104), ('uncertainty', 0.099), ('sense', 0.099), ('rlbench', 0.098), ('typed', 0.098), ('reduction', 0.094), ('graphical', 0.09), ('ml', 0.089), ('say', 0.088), ('preserve', 0.079), ('wildly', 0.079), ('like', 0.077), ('something', 0.075), ('beliefs', 0.073), ('seriously', 0.071), ('impossible', 0.069), ('functional', 0.069), ('probably', 0.069), ('modeling', 0.068), ('think', 0.067), ('thoughts', 0.065), ('basically', 0.065), ('things', 0.064), ('list', 0.064), ('us', 0.064), ('system', 0.064), ('classes', 0.062), ('recognition', 0.062), ('level', 0.062), ('existing', 0.061), ('models', 0.06), ('kind', 0.059), ('define', 0.058), ('recent', 0.058), ('check', 0.058), ('designed', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

2 0.24307716 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.23931739 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

4 0.14791965 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

5 0.12219469 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

6 0.11632729 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

7 0.10624555 351 hunch net-2009-05-02-Wielding a New Abstraction

8 0.1044182 352 hunch net-2009-05-06-Machine Learning to AI

9 0.10067801 347 hunch net-2009-03-26-Machine Learning is too easy

10 0.098918624 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

11 0.09554477 235 hunch net-2007-03-03-All Models of Learning have Flaws

12 0.092151612 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

13 0.091162086 454 hunch net-2012-01-30-ICML Posters and Scope

14 0.089997873 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

15 0.088742003 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

16 0.088511273 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

17 0.085887536 332 hunch net-2008-12-23-Use of Learning Theory

18 0.085101701 82 hunch net-2005-06-17-Reopening RL->Classification

19 0.083787642 77 hunch net-2005-05-29-Maximum Margin Mismatch?

20 0.082977727 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.234), (1, 0.061), (2, -0.026), (3, 0.028), (4, -0.053), (5, -0.03), (6, 0.119), (7, -0.035), (8, -0.047), (9, -0.077), (10, -0.079), (11, -0.087), (12, -0.056), (13, 0.042), (14, 0.025), (15, -0.055), (16, 0.071), (17, 0.036), (18, 0.062), (19, -0.115), (20, 0.045), (21, -0.049), (22, -0.067), (23, -0.036), (24, -0.017), (25, -0.08), (26, -0.027), (27, -0.014), (28, 0.033), (29, 0.078), (30, 0.047), (31, 0.011), (32, 0.006), (33, 0.044), (34, -0.023), (35, -0.073), (36, -0.145), (37, 0.016), (38, -0.037), (39, -0.09), (40, -0.009), (41, 0.008), (42, 0.042), (43, -0.109), (44, -0.105), (45, 0.055), (46, -0.011), (47, 0.078), (48, -0.056), (49, 0.09)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95633537 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

2 0.72539979 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.70610154 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

4 0.59749824 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

5 0.51935071 202 hunch net-2006-08-10-Precision is not accuracy

Introduction: In my experience, there are two different groups of people who believe the same thing: the mathematics encountered in typical machine learning conference papers is often of questionable value. The two groups who agree on this are applied machine learning people who have given up on math, and mature theoreticians who understand the limits of theory. Partly, this is just a statement about where we are with respect to machine learning. In particular, we have no mechanism capable of generating a prescription for how to solve all learning problems. In the absence of such certainty, people try to come up with formalisms that partially describe and motivate how and why they do things. This is natural and healthy—we might hope that it will eventually lead to just such a mechanism. But, part of this is simply an emphasis on complexity over clarity. A very natural and simple theoretical statement is often obscured by complexifications. Common sources of complexification include:

6 0.51622427 84 hunch net-2005-06-22-Languages of Learning

7 0.50853592 351 hunch net-2009-05-02-Wielding a New Abstraction

8 0.50837797 328 hunch net-2008-11-26-Efficient Reinforcement Learning in MDPs

9 0.48979568 103 hunch net-2005-08-18-SVM Adaptability

10 0.48568678 250 hunch net-2007-06-23-Machine Learning Jobs are Growing on Trees

11 0.48263291 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

12 0.47700459 335 hunch net-2009-01-08-Predictive Analytics World

13 0.47164828 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

14 0.46818131 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

15 0.46667024 352 hunch net-2009-05-06-Machine Learning to AI

16 0.46516383 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

17 0.4626559 82 hunch net-2005-06-17-Reopening RL->Classification

18 0.45699382 14 hunch net-2005-02-07-The State of the Reduction

19 0.45529783 153 hunch net-2006-02-02-Introspectionism as a Disease

20 0.45272449 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.023), (3, 0.026), (10, 0.027), (13, 0.012), (27, 0.164), (32, 0.239), (38, 0.105), (48, 0.013), (53, 0.045), (55, 0.072), (64, 0.031), (94, 0.092), (95, 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.86614448 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

2 0.7892133 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you

3 0.67223185 233 hunch net-2007-02-16-The Forgetting

Introduction: How many papers do you remember from 2006? 2005? 2002? 1997? 1987? 1967? One way to judge this would be to look at the citations of the papers you write—how many came from which year? For myself, the answers on recent papers are: year 2006 2005 2002 1997 1987 1967 count 4 10 5 1 0 0 This spectrum is fairly typical of papers in general. There are many reasons that citations are focused on recent papers. The number of papers being published continues to grow. This is not a very significant effect, because the rate of publication has not grown nearly as fast. Dead men don’t reject your papers for not citing them. This reason seems lame, because it’s a distortion from the ideal of science. Nevertheless, it must be stated because the effect can be significant. In 1997, I started as a PhD student. Naturally, papers after 1997 are better remembered because they were absorbed in real time. A large fraction of people writing papers and a

4 0.65886176 353 hunch net-2009-05-08-Computability in Artificial Intelligence

Introduction: Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions. The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic). Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to

5 0.65833759 95 hunch net-2005-07-14-What Learning Theory might do

Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a

6 0.65430522 19 hunch net-2005-02-14-Clever Methods of Overfitting

7 0.65405542 343 hunch net-2009-02-18-Decision by Vetocracy

8 0.65326613 131 hunch net-2005-11-16-The Everything Ensemble Edge

9 0.65276825 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

10 0.65074241 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

11 0.64988059 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

12 0.64901996 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

13 0.64867818 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

14 0.64595771 98 hunch net-2005-07-27-Not goal metrics

15 0.64544624 259 hunch net-2007-08-19-Choice of Metrics

16 0.6448372 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.64481211 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

18 0.64405096 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.64169312 423 hunch net-2011-02-02-User preferences for search engines

20 0.64093304 194 hunch net-2006-07-11-New Models