hunch_net hunch_net-2005 hunch_net-2005-58 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
sentIndex sentText sentNum sentScore
1 I’ll try to give a high level summary of why it’s interesting. [sent-2, score-0.304]
2 Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. [sent-3, score-0.451]
3 It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. [sent-4, score-0.341]
4 In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ? [sent-5, score-0.974]
5 ”, and the larger problem is the same except at timestep t+1 . [sent-6, score-0.382]
6 There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . [sent-7, score-0.286]
7 Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. [sent-8, score-0.714]
8 “Carefully” here is defined by Dijkstra’s shortest path algorithm . [sent-9, score-0.564]
9 queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for the priority queue of Dijkstra’s shortest path. [sent-10, score-1.727]
10 This can yield computational speedups varying between negligible and outstanding. [sent-11, score-0.172]
11 A * -> Hierarchical A * The efficiency of A * search is dependent on the tightness of it’s lower bound, which brings up the question: “Where do you get the lower bound? [sent-12, score-0.588]
12 ” One appealing answer is from A * applied to a simplified problem equivalent to the original problem, but with states aliased (many states in original problem = 1 state in new problem). [sent-13, score-0.925]
13 This technique can be applied recursively until the problem is trivial. [sent-14, score-0.289]
14 Each of these steps has been noted previously (although perhaps not in the generality of this paper). [sent-15, score-0.143]
15 What seems new and interesting is that the entire hierarchy of A * searches can be done simultaneously on one priority queue. [sent-16, score-0.352]
16 The resulting algorithm can use low level information to optimize high level search as well as high level information to optimize low level search in a holistic process. [sent-17, score-1.781]
17 It’s not clear yet how far this approach can be pushed, but this quality is quite appealing. [sent-18, score-0.073]
wordName wordTfidf (topN-words)
[('path', 0.286), ('shortest', 0.278), ('dynamic', 0.262), ('dijkstra', 0.209), ('queued', 0.209), ('programming', 0.203), ('level', 0.194), ('priority', 0.185), ('timestep', 0.185), ('viterbi', 0.172), ('bound', 0.169), ('decoding', 0.162), ('probable', 0.162), ('search', 0.149), ('lower', 0.143), ('problem', 0.118), ('states', 0.111), ('original', 0.111), ('high', 0.11), ('optimize', 0.11), ('yield', 0.1), ('ending', 0.093), ('holistic', 0.093), ('paths', 0.093), ('recursively', 0.093), ('carefully', 0.092), ('low', 0.087), ('state', 0.086), ('instantiated', 0.086), ('completion', 0.086), ('hierarchy', 0.086), ('pedro', 0.086), ('cost', 0.085), ('appealing', 0.081), ('extensions', 0.081), ('pushed', 0.081), ('searches', 0.081), ('tightness', 0.081), ('larger', 0.079), ('applied', 0.078), ('optimizations', 0.077), ('hierarchical', 0.077), ('paradigm', 0.077), ('noted', 0.074), ('mcallester', 0.074), ('far', 0.073), ('brings', 0.072), ('speedups', 0.072), ('generality', 0.069), ('subproblems', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
2 0.15680456 215 hunch net-2006-10-22-Exemplar programming
Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw
3 0.13208458 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
4 0.11950964 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa
5 0.11289371 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
Introduction: I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal . The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning. The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features. The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential
6 0.10194097 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
7 0.095744044 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
8 0.095034726 189 hunch net-2006-07-05-more icml papers
9 0.094639301 120 hunch net-2005-10-10-Predictive Search is Coming
10 0.088817313 77 hunch net-2005-05-29-Maximum Margin Mismatch?
11 0.087616012 244 hunch net-2007-05-09-The Missing Bound
12 0.087221116 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
13 0.085274979 85 hunch net-2005-06-28-A COLT paper
14 0.084851623 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
15 0.084697768 388 hunch net-2010-01-24-Specializations of the Master Problem
16 0.08468461 220 hunch net-2006-11-27-Continuizing Solutions
17 0.082740545 169 hunch net-2006-04-05-What is state?
18 0.078052312 345 hunch net-2009-03-08-Prediction Science
19 0.075848699 332 hunch net-2008-12-23-Use of Learning Theory
20 0.075397722 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
topicId topicWeight
[(0, 0.158), (1, 0.061), (2, 0.009), (3, 0.013), (4, 0.031), (5, -0.041), (6, 0.061), (7, -0.001), (8, -0.023), (9, -0.041), (10, -0.026), (11, 0.064), (12, -0.048), (13, 0.085), (14, -0.024), (15, -0.027), (16, 0.016), (17, 0.037), (18, 0.057), (19, -0.045), (20, 0.065), (21, 0.035), (22, -0.114), (23, 0.039), (24, -0.035), (25, 0.198), (26, -0.02), (27, 0.039), (28, 0.047), (29, 0.055), (30, -0.056), (31, -0.035), (32, 0.107), (33, -0.016), (34, -0.089), (35, 0.012), (36, -0.006), (37, 0.017), (38, -0.059), (39, -0.01), (40, 0.012), (41, -0.066), (42, 0.059), (43, 0.063), (44, 0.016), (45, -0.076), (46, -0.011), (47, 0.034), (48, -0.031), (49, 0.001)]
simIndex simValue blogId blogTitle
same-blog 1 0.96760041 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
2 0.52577877 244 hunch net-2007-05-09-The Missing Bound
Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,
3 0.52496904 215 hunch net-2006-10-22-Exemplar programming
Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw
4 0.50869888 189 hunch net-2006-07-05-more icml papers
Introduction: Here are a few other papers I enjoyed from ICML06. Topic Models: Dynamic Topic Models David Blei, John Lafferty A nice model for how topics in LDA type models can evolve over time, using a linear dynamical system on the natural parameters and a very clever structured variational approximation (in which the mean field parameters are pseudo-observations of a virtual LDS). Like all Blei papers, he makes it look easy, but it is extremely impressive. Pachinko Allocation Wei Li, Andrew McCallum A very elegant (but computationally challenging) model which induces correlation amongst topics using a multi-level DAG whose interior nodes are “super-topics” and “sub-topics” and whose leaves are the vocabulary words. Makes the slumbering monster of structure learning stir. Sequence Analysis (I missed these talks since I was chairing another session) Online Decoding of Markov Models with Latency Constraints Mukund Narasimhan, Paul Viola, Michael Shilman An “a
5 0.49634451 85 hunch net-2005-06-28-A COLT paper
Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.
6 0.49298677 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
7 0.48647106 120 hunch net-2005-10-10-Predictive Search is Coming
8 0.48629388 269 hunch net-2007-10-24-Contextual Bandits
9 0.48282075 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
10 0.48202592 41 hunch net-2005-03-15-The State of Tight Bounds
11 0.47971106 220 hunch net-2006-11-27-Continuizing Solutions
12 0.47181031 87 hunch net-2005-06-29-Not EM for clustering at COLT
13 0.47075376 77 hunch net-2005-05-29-Maximum Margin Mismatch?
14 0.44494104 82 hunch net-2005-06-17-Reopening RL->Classification
15 0.43098336 392 hunch net-2010-03-26-A Variance only Deviation Bound
16 0.42406252 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
17 0.42387605 169 hunch net-2006-04-05-What is state?
18 0.42320484 276 hunch net-2007-12-10-Learning Track of International Planning Competition
19 0.41987863 423 hunch net-2011-02-02-User preferences for search engines
20 0.41959545 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
topicId topicWeight
[(21, 0.43), (27, 0.164), (38, 0.058), (53, 0.058), (55, 0.066), (94, 0.086), (95, 0.042)]
simIndex simValue blogId blogTitle
same-blog 1 0.85923421 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
2 0.75694036 369 hunch net-2009-08-27-New York Area Machine Learning Events
Introduction: Several events are happening in the NY area. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence ) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions. The Singularity Summit October 3&4 . This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT . Predictive Analytics World, Oct 20-21 . This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable.
3 0.72620875 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi
4 0.66191947 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what
5 0.63821101 267 hunch net-2007-10-17-Online as the new adjective
Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note
6 0.56924665 378 hunch net-2009-11-15-The Other Online Learning
7 0.55548626 387 hunch net-2010-01-19-Deadline Season, 2010
8 0.47931185 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
9 0.47110236 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
10 0.46831584 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)
11 0.46805245 194 hunch net-2006-07-11-New Models
12 0.44285259 309 hunch net-2008-07-10-Interesting papers, ICML 2008
13 0.44069248 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
14 0.43973279 95 hunch net-2005-07-14-What Learning Theory might do
15 0.43358552 235 hunch net-2007-03-03-All Models of Learning have Flaws
16 0.43296397 351 hunch net-2009-05-02-Wielding a New Abstraction
17 0.43279669 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
18 0.43275934 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
19 0.43231642 371 hunch net-2009-09-21-Netflix finishes (and starts)
20 0.43195415 360 hunch net-2009-06-15-In Active Learning, the question changes