hunch_net hunch_net-2006 hunch_net-2006-215 knowledge-graph by maker-knowledge-mining

215 hunch net-2006-10-22-Exemplar programming


meta infos for this blog

Source: html

Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There are many different abstractions for problem definition and solution. [sent-1, score-0.293]

2 The composed execution of these functions yields the solution. [sent-3, score-0.193]

3 Linear programming: a set of constraints and a linear objective function are defined. [sent-4, score-0.142]

4 Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). [sent-6, score-0.294]

5 Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). [sent-7, score-0.326]

6 Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. [sent-8, score-0.347]

7 SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. [sent-9, score-0.165]

8 These abstractions have different tradeoffs between ease of use, generality, and the effectiveness of existing solutions. [sent-12, score-0.29]

9 Machine learning can be thought of as exemplar programming. [sent-13, score-0.486]

10 Exemplar programming is creating examples of a (input,output) pairs which are used by an algorithm to predict a function from input to output. [sent-14, score-0.684]

11 There are several subtle issues related to overfitting, independence, and representativeness of the samples which take significant effort to describe to an unfamiliar person. [sent-17, score-0.192]

12 Making this abstraction easier to use via careful language design is an area where effort may pay off. [sent-18, score-0.378]

13 How effectve are the exemplar programming solvers (aka learning algorithms)? [sent-19, score-1.126]

14 Exemplar programming is a viewpoint of machine learning which (mostly) ignores statistics, prior information, and the possibility of overfitting. [sent-27, score-0.834]

15 Exemplar programming creates a split between problem solution and problem formation. [sent-29, score-0.894]

16 This is important because the problem solver can heavily optimized (for speed, scalibility, effectiveness on common problems, etc…) making the process of apply machine learning simply a matter of specifying the exemplars. [sent-30, score-0.491]

17 The formation/solution split helps us focus on problem formation independent of solution. [sent-31, score-0.286]

18 The big gains in machine learning in the last decade seem to be discovering how to apply to new areas. [sent-32, score-0.377]

19 A significant step in any application to a new area is discovering the right way to formulate the problem. [sent-33, score-0.292]

20 Exemplar programming seems to be a useful viewpoint for machine learning in “big data” problems with many examples where significant prior information is lacking. [sent-34, score-0.959]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('programming', 0.571), ('exemplar', 0.486), ('solver', 0.139), ('gains', 0.139), ('abstractions', 0.121), ('split', 0.121), ('quadratic', 0.121), ('effectiveness', 0.111), ('flexible', 0.104), ('slower', 0.104), ('abstraction', 0.104), ('language', 0.101), ('problem', 0.101), ('discovering', 0.098), ('linear', 0.089), ('apply', 0.08), ('viewpoint', 0.076), ('definition', 0.071), ('solvers', 0.069), ('phrased', 0.069), ('formulate', 0.069), ('honest', 0.069), ('increasingly', 0.069), ('sat', 0.069), ('significant', 0.068), ('functions', 0.068), ('conjunction', 0.064), ('composed', 0.064), ('finds', 0.064), ('lp', 0.064), ('formation', 0.064), ('ignores', 0.064), ('effective', 0.063), ('effort', 0.063), ('prior', 0.063), ('problems', 0.061), ('aka', 0.061), ('execution', 0.061), ('unfamiliar', 0.061), ('caching', 0.061), ('recursive', 0.061), ('machine', 0.06), ('examples', 0.06), ('tradeoffs', 0.058), ('area', 0.057), ('engine', 0.056), ('constrained', 0.054), ('function', 0.053), ('via', 0.053), ('generality', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 215 hunch net-2006-10-22-Exemplar programming

Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw

2 0.15680456 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for

3 0.13817945 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

4 0.13694756 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

5 0.11562029 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne

6 0.11052868 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

7 0.10742531 147 hunch net-2006-01-08-Debugging Your Brain

8 0.10516211 235 hunch net-2007-03-03-All Models of Learning have Flaws

9 0.10455032 347 hunch net-2009-03-26-Machine Learning is too easy

10 0.09910877 228 hunch net-2007-01-15-The Machine Learning Department

11 0.098960035 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

12 0.096999057 332 hunch net-2008-12-23-Use of Learning Theory

13 0.094350427 229 hunch net-2007-01-26-Parallel Machine Learning Problems

14 0.094170727 120 hunch net-2005-10-10-Predictive Search is Coming

15 0.093599558 220 hunch net-2006-11-27-Continuizing Solutions

16 0.091375157 35 hunch net-2005-03-04-The Big O and Constants in Learning

17 0.091101564 14 hunch net-2005-02-07-The State of the Reduction

18 0.08629749 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

19 0.083410852 388 hunch net-2010-01-24-Specializations of the Master Problem

20 0.076773889 237 hunch net-2007-04-02-Contextual Scaling


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.179), (1, 0.08), (2, -0.041), (3, 0.031), (4, 0.023), (5, -0.041), (6, 0.021), (7, 0.025), (8, 0.01), (9, -0.01), (10, -0.069), (11, -0.041), (12, 0.016), (13, 0.073), (14, -0.0), (15, -0.041), (16, 0.056), (17, -0.013), (18, -0.035), (19, -0.032), (20, 0.034), (21, 0.034), (22, 0.018), (23, 0.014), (24, -0.042), (25, 0.024), (26, 0.041), (27, 0.053), (28, 0.036), (29, 0.074), (30, -0.062), (31, 0.018), (32, 0.113), (33, 0.075), (34, -0.021), (35, 0.045), (36, 0.055), (37, -0.024), (38, -0.03), (39, 0.006), (40, 0.061), (41, -0.071), (42, 0.12), (43, 0.013), (44, 0.005), (45, -0.091), (46, -0.094), (47, 0.008), (48, 0.007), (49, 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9431622 215 hunch net-2006-10-22-Exemplar programming

Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw

2 0.7517277 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

3 0.6829775 84 hunch net-2005-06-22-Languages of Learning

Introduction: A language is a set of primitives which can be combined to succesfully create complex objects. Languages arise in all sorts of situations: mechanical construction, martial arts, communication, etc… Languages appear to be the key to succesfully creating complex objects—it is difficult to come up with any convincing example of a complex object which is not built using some language. Since languages are so crucial to success, it is interesting to organize various machine learning research programs by language. The most common language in machine learning are languages for representing the solution to machine learning. This includes: Bayes Nets and Graphical Models A language for representing probability distributions. The key concept supporting modularity is conditional independence. Michael Kearns has been working on extending this to game theory. Kernelized Linear Classifiers A language for representing linear separators, possibly in a large space. The key form of

4 0.64879411 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for

5 0.64003003 128 hunch net-2005-11-05-The design of a computing cluster

Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,

6 0.63091159 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

7 0.6285733 229 hunch net-2007-01-26-Parallel Machine Learning Problems

8 0.60040277 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

9 0.59951472 68 hunch net-2005-05-10-Learning Reductions are Reductionist

10 0.59568006 147 hunch net-2006-01-08-Debugging Your Brain

11 0.58851951 370 hunch net-2009-09-18-Necessary and Sufficient Research

12 0.58374196 347 hunch net-2009-03-26-Machine Learning is too easy

13 0.58084917 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

14 0.57701778 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

15 0.57673883 351 hunch net-2009-05-02-Wielding a New Abstraction

16 0.56534946 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

17 0.56412929 67 hunch net-2005-05-06-Don’t mix the solution into the problem

18 0.54584652 33 hunch net-2005-02-28-Regularization

19 0.53543675 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

20 0.53038311 235 hunch net-2007-03-03-All Models of Learning have Flaws


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.335), (3, 0.029), (10, 0.021), (27, 0.161), (38, 0.051), (53, 0.073), (55, 0.048), (77, 0.017), (94, 0.103), (95, 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.91453701 87 hunch net-2005-06-29-Not EM for clustering at COLT

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec

2 0.90288663 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over

same-blog 3 0.88057595 215 hunch net-2006-10-22-Exemplar programming

Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw

4 0.87561703 62 hunch net-2005-04-26-To calibrate or not?

Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b

5 0.87483984 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

Introduction: …is discussed in this nytimes article . I generally expect such approaches to become more common since computers are getting faster, machine learning is getting better, and data is becoming more plentiful. This is another example where machine learning technology may have a huge economic impact. Some side notes: We-in-research know almost nothing about how these things are done (because it is typically a corporate secret). … but the limited discussion in the article seem naive from a machine learning viewpoint. The learning process used apparently often fails to take into account transaction costs. What little of the approaches is discussed appears modeling based. It seems plausible that more direct prediction methods can yield an edge. One difficulty with stock picking as a research topic is that it is inherently a zero sum game (for every winner, there is a loser). Much of the rest of research is positive sum (basically, everyone wins).

6 0.78987235 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

7 0.6614784 133 hunch net-2005-11-28-A question of quantification

8 0.57622653 5 hunch net-2005-01-26-Watchword: Probability

9 0.5741148 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

10 0.56947154 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

11 0.56671059 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

12 0.56442994 177 hunch net-2006-05-05-An ICML reject

13 0.56372517 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

14 0.55850065 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

15 0.55137241 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

16 0.55112875 131 hunch net-2005-11-16-The Everything Ensemble Edge

17 0.5483371 14 hunch net-2005-02-07-The State of the Reduction

18 0.54790193 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

19 0.54740041 258 hunch net-2007-08-12-Exponentiated Gradient

20 0.54570258 28 hunch net-2005-02-25-Problem: Online Learning