hunch_net hunch_net-2009 hunch_net-2009-373 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal . The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning. The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features. The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential
sentIndex sentText sentNum sentScore
1 I have had interesting discussions about distinction between static vs. [sent-1, score-0.692]
2 The distinction arises in multiclass prediction settings. [sent-3, score-0.479]
3 A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. [sent-4, score-1.384]
4 The static approach is the one that we typically analyze and think about in machine learning. [sent-5, score-0.545]
5 The dynamic setting is one that is often used in practice. [sent-6, score-0.434]
6 The basic idea is that the number of classes is not fixed, varying on a per example basis. [sent-7, score-0.547]
7 These different classes are generally defined by a choice of features. [sent-8, score-0.619]
8 The distinction between these two settings as far as theory goes, appears to be very substantial. [sent-9, score-0.235]
9 For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. [sent-10, score-1.036]
10 In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential gap may be essential, at least without further assumptions. [sent-11, score-0.673]
11 Are there techniques for converting from dynamic multiclass to static multiclass? [sent-12, score-1.102]
12 For example, we could embed a dynamic set of classes within a much larger static set ranging over all possible dynamic classes while eliminating all class-dependent features. [sent-13, score-2.527]
13 In some cases, this approach may work well, but I’ve also seen it fail, with the basic problem being that a learning algorithm might easily choose an invalid class. [sent-14, score-0.228]
14 We could of course force a learning algorithm to choose amongst the dynamically valid set, but I don’t know a general way to do that without making the running time at least scale with the number of valid classes. [sent-15, score-0.967]
15 So, a basic question that’s bothering me is: When and how can we effectively predict amongst a set of dynamically defined classes in sublinear time? [sent-16, score-1.486]
16 A quick answer is “it’s not possible because simply reading off the set of dynamically defined classes require O(class count) time”. [sent-17, score-1.155]
17 This answer isn’t satisfying, because there are many ways to implicitly specify a set in sublinear time. [sent-18, score-0.672]
18 So the modified question is “Are there natural ways to dynamically define classes in sublinear time? [sent-19, score-1.23]
wordName wordTfidf (topN-words)
[('static', 0.454), ('classes', 0.432), ('dynamic', 0.322), ('sublinear', 0.303), ('dynamically', 0.281), ('distinction', 0.189), ('multiclass', 0.169), ('set', 0.159), ('defined', 0.13), ('choose', 0.123), ('valid', 0.121), ('setting', 0.112), ('techniques', 0.094), ('time', 0.076), ('embed', 0.076), ('kishore', 0.076), ('amongst', 0.074), ('prediction', 0.07), ('ranging', 0.07), ('modified', 0.066), ('converting', 0.063), ('force', 0.063), ('land', 0.061), ('basic', 0.06), ('without', 0.059), ('generally', 0.057), ('ways', 0.056), ('answer', 0.055), ('varying', 0.055), ('gap', 0.054), ('implicitly', 0.054), ('eliminating', 0.054), ('satisfying', 0.054), ('count', 0.054), ('arises', 0.051), ('quick', 0.051), ('furthermore', 0.049), ('discussions', 0.049), ('hal', 0.049), ('least', 0.049), ('question', 0.047), ('goes', 0.047), ('fixed', 0.047), ('possible', 0.047), ('settings', 0.046), ('exponential', 0.046), ('analyze', 0.046), ('approach', 0.045), ('define', 0.045), ('specify', 0.045)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
Introduction: I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal . The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning. The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features. The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential
2 0.1476627 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch
Introduction: Since we last discussed the other online learning , Stanford has very visibly started pushing mass teaching in AI , Machine Learning , and Databases . In retrospect, it’s not too surprising that the next step up in serious online teaching experiments are occurring at the computer science department of a university embedded in the land of startups. Numbers on the order of 100000 are quite significant—similar in scale to the number of computer science undergraduate students/year in the US. Although these populations surely differ, the fact that they could overlap is worth considering for the future. It’s too soon to say how successful these classes will be and there are many easy criticisms to make: Registration != Learning … but if only 1/10th complete these classes, the scale of teaching still surpasses the scale of any traditional process. 1st year excitement != nth year routine … but if only 1/10th take future classes, the scale of teaching still surpass
3 0.12000939 388 hunch net-2010-01-24-Specializations of the Master Problem
Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of
4 0.11289371 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
5 0.10275127 267 hunch net-2007-10-17-Online as the new adjective
Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note
6 0.10002732 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
7 0.095001839 228 hunch net-2007-01-15-The Machine Learning Department
8 0.085830159 378 hunch net-2009-11-15-The Other Online Learning
9 0.084840328 123 hunch net-2005-10-16-Complexity: It’s all in your head
10 0.080315091 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning
11 0.074238271 167 hunch net-2006-03-27-Gradients everywhere
12 0.074051827 351 hunch net-2009-05-02-Wielding a New Abstraction
13 0.073522851 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
14 0.073362954 220 hunch net-2006-11-27-Continuizing Solutions
15 0.071905009 19 hunch net-2005-02-14-Clever Methods of Overfitting
16 0.069938675 160 hunch net-2006-03-02-Why do people count for learning?
17 0.069027551 309 hunch net-2008-07-10-Interesting papers, ICML 2008
18 0.068709627 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
19 0.066959605 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
20 0.065338768 269 hunch net-2007-10-24-Contextual Bandits
topicId topicWeight
[(0, 0.155), (1, 0.077), (2, -0.005), (3, 0.002), (4, 0.022), (5, -0.029), (6, 0.035), (7, 0.001), (8, -0.034), (9, 0.034), (10, 0.026), (11, 0.028), (12, 0.043), (13, -0.001), (14, 0.001), (15, -0.025), (16, 0.005), (17, -0.011), (18, 0.095), (19, 0.084), (20, 0.004), (21, -0.043), (22, -0.002), (23, -0.002), (24, 0.041), (25, 0.003), (26, 0.035), (27, -0.002), (28, -0.035), (29, 0.076), (30, 0.025), (31, -0.033), (32, 0.071), (33, 0.025), (34, -0.054), (35, 0.046), (36, -0.025), (37, 0.072), (38, -0.02), (39, -0.043), (40, 0.082), (41, 0.049), (42, 0.005), (43, 0.049), (44, 0.011), (45, -0.056), (46, -0.018), (47, -0.036), (48, -0.027), (49, -0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.94936121 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
Introduction: I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal . The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning. The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features. The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential
2 0.60883522 220 hunch net-2006-11-27-Continuizing Solutions
Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim
3 0.59096783 378 hunch net-2009-11-15-The Other Online Learning
Introduction: If you search for “online learning” with any major search engine , it’s interesting to note that zero of the results are for online machine learning. This may not be a mistake if you are committed to a global ordering. In other words, the number of people specifically interested in the least interesting top-10 online human learning result might exceed the number of people interested in online machine learning, even given the presence of the other 9 results. The essential observation here is that the process of human learning is a big business (around 5% of GDP) effecting virtually everyone. The internet is changing this dramatically, by altering the economics of teaching. Consider two possibilities: The classroom-style teaching environment continues as is, with many teachers for the same subject. All the teachers for one subject get together, along with perhaps a factor of 2 more people who are experts in online delivery. They spend a factor of 4 more time designing
4 0.55092752 252 hunch net-2007-07-01-Watchword: Online Learning
Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra
5 0.5500806 388 hunch net-2010-01-24-Specializations of the Master Problem
Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of
6 0.54966378 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch
7 0.52710116 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
8 0.52632028 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
9 0.52540612 385 hunch net-2009-12-27-Interesting things at NIPS 2009
10 0.52057803 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility
11 0.51838583 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
12 0.51619393 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
13 0.50835901 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
14 0.50618321 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
15 0.50252378 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
16 0.50215822 269 hunch net-2007-10-24-Contextual Bandits
17 0.5007112 267 hunch net-2007-10-17-Online as the new adjective
18 0.49770325 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
19 0.48778772 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
20 0.48360699 309 hunch net-2008-07-10-Interesting papers, ICML 2008
topicId topicWeight
[(3, 0.044), (27, 0.241), (38, 0.062), (55, 0.079), (94, 0.034), (95, 0.414)]
simIndex simValue blogId blogTitle
1 0.97255659 389 hunch net-2010-02-26-Yahoo! ML events
Introduction: Yahoo! is sponsoring two machine learning events that might interest people. The Key Scientific Challenges program (due March 5) for Machine Learning and Statistics offers $5K (plus bonuses) for graduate students working on a core problem of interest to Y! If you are already working on one of these problems, there is no reason not to submit, and if you aren’t you might want to think about it for next year, as I am confident they all press the boundary of the possible in Machine Learning. There are 7 days left. The Learning to Rank challenge (due May 31) offers an $8K first prize for the best ranking algorithm on a real (and really used) dataset for search ranking, with presentations at an ICML workshop. Unlike the Netflix competition, there are prizes for 2nd, 3rd, and 4th place, perhaps avoiding the heartbreak the ensemble encountered. If you think you know how to rank, you should give it a try, and we might all learn something. There are 3 months left.
2 0.97077137 30 hunch net-2005-02-25-Why Papers?
Introduction: Makc asked a good question in comments—”Why bother to make a paper, at all?” There are several reasons for writing papers which may not be immediately obvious to people not in academia. The basic idea is that papers have considerably more utility than the obvious “present an idea”. Papers are a formalized units of work. Academics (especially young ones) are often judged on the number of papers they produce. Papers have a formalized method of citing and crediting other—the bibliography. Academics (especially older ones) are often judged on the number of citations they receive. Papers enable a “more fair” anonymous review. Conferences receive many papers, from which a subset are selected. Discussion forums are inherently not anonymous for anyone who wants to build a reputation for good work. Papers are an excuse to meet your friends. Papers are the content of conferences, but much of what you do is talk to friends about interesting problems while there. Sometimes yo
3 0.96295804 127 hunch net-2005-11-02-Progress in Active Learning
Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r
4 0.96258509 319 hunch net-2008-10-01-NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’
Introduction: This workshop asks for insights how far we may/can push the theoretical boundary of using data in the design of learning machines. Can we express our classification rule in terms of the sample, or do we have to stick to a core assumption of classical statistical learning theory, namely that the hypothesis space is to be defined independent from the sample? This workshop is particularly interested in – but not restricted to – the ‘luckiness framework’ and the recently introduced notion of ‘compatibility functions’ in a semi-supervised learning context (more information can be found at http://www.kuleuven.be/wehys ).
5 0.9593057 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled
Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.
6 0.95767403 456 hunch net-2012-02-24-ICML+50%
7 0.93161869 344 hunch net-2009-02-22-Effective Research Funding
same-blog 8 0.92934155 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
9 0.91187537 479 hunch net-2013-01-31-Remote large scale learning class participation
10 0.81669426 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
11 0.78562582 462 hunch net-2012-04-20-Both new: STOC workshops and NEML
12 0.77765799 234 hunch net-2007-02-22-Create Your Own ICML Workshop
13 0.73658848 7 hunch net-2005-01-31-Watchword: Assumption
14 0.72385389 36 hunch net-2005-03-05-Funding Research
15 0.71416199 464 hunch net-2012-05-03-Microsoft Research, New York City
16 0.70475376 466 hunch net-2012-06-05-ICML acceptance statistics
17 0.69801372 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
18 0.69600058 360 hunch net-2009-06-15-In Active Learning, the question changes
19 0.69414276 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch
20 0.69133997 132 hunch net-2005-11-26-The Design of an Optimal Research Environment