hunch_net hunch_net-2009 hunch_net-2009-351 knowledge-graph by maker-knowledge-mining

351 hunch net-2009-05-02-Wielding a New Abstraction

meta infos for this blog

Source: html

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. [sent-5, score-0.692]

2 It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. [sent-8, score-0.598]

3 Here’s my current attempt: The process of abstraction for learning reductions can start with sample complexity bounds (or online learning against an adversary analysis). [sent-11, score-0.812]

4 The impact of this complexity is that it is difficult to effectively use these bounds in practical learning algorithm design, particularly in solving more complex learning problems where much more than one bit of prediction is required. [sent-15, score-0.532]

5 The abstraction in the learning reduction setting is: You throw away d , because it only has a logarithmic dependence anyways. [sent-18, score-0.747]

6 This abstraction is radical in some sense, but something radical was needed to yield tractable and useful analysis on the complex problems people need to solve in practice. [sent-21, score-1.07]

7 Once you get past the lack of familiarity and misunderstandings, there is a feeling that the new abstraction is cheating. [sent-26, score-0.723]

8 The TCP abstraction is broken when the network is too unreliable for it to recover, such as on sketchy wireless networks where the programmer built for the TCP abstraction which wasn’t delivered. [sent-35, score-1.143]

9 The basic idea is to just look at the units when estimating some quantity and combine them to get the right unit answer. [sent-37, score-0.276]

10 For example, to compute the distance d traveled after time t with acceleration a , you simply use at 2 , since that formula is the only way to combine a with units of distance/time 2 and t with units of time to get units of distance . [sent-38, score-0.802]

11 This property that abstractions can be abused is generically essential to the process of abstraction itself. [sent-45, score-0.815]

12 Abstraction is about neglecting details, and when these details are not neglectable, the abstraction is abused or ineffective. [sent-46, score-0.75]

13 Because of this, any abstraction is insufficient for analyzing and solving real problems where the neglected details matter. [sent-47, score-0.757]

14 A good abstraction can capture a more complete specification of the problem. [sent-53, score-0.534]

15 As an example, the sample complexity view of learning is broken in practice, because when insufficient performance is achieved people choose a different set of hypotheses H by throwing in additional features or choosing a different learning algorithm. [sent-54, score-0.368]

16 A higher level abstraction can let you accidentally solve problems in other areas as well. [sent-59, score-0.682]

17 Perhaps the most interesting effect is that the new abstraction can aid you in finding effective solutions to new problems. [sent-63, score-0.731]

18 Given training-time access to a good policy oracle, Searn provides a method for decomposing any complex prediction problem into simple problems, such that low regret solutions to the simple problems imply a low regret solution to the original problem. [sent-65, score-0.451]

19 More generally, when I learn about the details of other complex prediction systems for machine translation or vision, the base algorithms are tweaked, typically in ways that Searn would suggest. [sent-69, score-0.293]

20 Furthermore, they are composable by design, so they should stay relevant (and perhaps even become more so), when people use an online active deep semisupervised probabilistic convolutional algorithm to solve a problem, particularly for complex problems. [sent-86, score-0.348]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('abstraction', 0.534), ('tcp', 0.211), ('units', 0.15), ('abused', 0.15), ('complex', 0.146), ('searn', 0.133), ('abstractions', 0.131), ('reduction', 0.126), ('reductions', 0.105), ('analysis', 0.102), ('logarithmic', 0.087), ('complexity', 0.087), ('sample', 0.086), ('breakages', 0.085), ('pathetic', 0.085), ('problems', 0.082), ('iid', 0.081), ('prediction', 0.081), ('distance', 0.078), ('solutions', 0.077), ('algorithm', 0.076), ('insufficient', 0.075), ('unreliable', 0.075), ('example', 0.072), ('bianca', 0.07), ('radical', 0.07), ('tournament', 0.07), ('eps', 0.07), ('hiding', 0.07), ('time', 0.068), ('details', 0.066), ('feeling', 0.066), ('constants', 0.066), ('elimination', 0.066), ('unit', 0.066), ('tournaments', 0.066), ('architecting', 0.066), ('solve', 0.066), ('provides', 0.065), ('compelling', 0.064), ('mechanism', 0.064), ('familiarity', 0.063), ('used', 0.061), ('particularly', 0.06), ('view', 0.06), ('combine', 0.06), ('hypotheses', 0.06), ('new', 0.06), ('thesis', 0.058), ('eliminate', 0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000006 351 hunch net-2009-05-02-Wielding a New Abstraction

2 0.21136203 103 hunch net-2005-08-18-SVM Adaptability

Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (thatâ€™s my own

3 0.2057671 35 hunch net-2005-03-04-The Big O and Constants in Learning

Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera

4 0.19296248 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

5 0.17869347 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto

6 0.1766326 14 hunch net-2005-02-07-The State of the Reduction

7 0.1732568 343 hunch net-2009-02-18-Decision by Vetocracy

8 0.16602181 332 hunch net-2008-12-23-Use of Learning Theory

9 0.16589342 235 hunch net-2007-03-03-All Models of Learning have Flaws

10 0.15670927 177 hunch net-2006-05-05-An ICML reject

11 0.155986 388 hunch net-2010-01-24-Specializations of the Master Problem

12 0.14277193 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

13 0.1425297 454 hunch net-2012-01-30-ICML Posters and Scope

14 0.13758403 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

15 0.13694756 215 hunch net-2006-10-22-Exemplar programming

16 0.13619636 45 hunch net-2005-03-22-Active learning

17 0.13593148 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

18 0.13581567 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

19 0.13518371 347 hunch net-2009-03-26-Machine Learning is too easy

20 0.13182136 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.36), (1, 0.146), (2, 0.024), (3, -0.004), (4, 0.022), (5, -0.065), (6, 0.129), (7, -0.007), (8, -0.021), (9, 0.011), (10, -0.065), (11, -0.036), (12, -0.003), (13, 0.077), (14, -0.013), (15, 0.0), (16, 0.015), (17, -0.003), (18, -0.027), (19, -0.012), (20, 0.017), (21, -0.007), (22, -0.011), (23, -0.021), (24, -0.055), (25, -0.027), (26, -0.05), (27, -0.009), (28, 0.015), (29, 0.014), (30, 0.013), (31, -0.04), (32, 0.053), (33, 0.007), (34, -0.086), (35, -0.003), (36, 0.034), (37, -0.01), (38, -0.005), (39, 0.105), (40, -0.044), (41, 0.021), (42, 0.011), (43, -0.059), (44, 0.093), (45, 0.012), (46, -0.017), (47, 0.017), (48, -0.054), (49, 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95803779 351 hunch net-2009-05-02-Wielding a New Abstraction

2 0.76782387 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g

3 0.74882913 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

4 0.72638851 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

5 0.72589481 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.

6 0.70002329 303 hunch net-2008-06-09-The Minimum Sample Complexity of Importance Weighting

7 0.69922251 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

8 0.69723088 332 hunch net-2008-12-23-Use of Learning Theory

9 0.69674659 176 hunch net-2006-05-01-A conversation between Theo and Pat

10 0.69343191 33 hunch net-2005-02-28-Regularization

11 0.68437672 103 hunch net-2005-08-18-SVM Adaptability

12 0.67982656 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

13 0.6784265 67 hunch net-2005-05-06-Don’t mix the solution into the problem

14 0.67441261 370 hunch net-2009-09-18-Necessary and Sufficient Research

15 0.67396975 177 hunch net-2006-05-05-An ICML reject

16 0.66962785 148 hunch net-2006-01-13-Benchmarks for RL

17 0.66601473 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.65160996 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

19 0.64734745 42 hunch net-2005-03-17-Going all the Way, Sometimes

20 0.64722991 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.047), (16, 0.011), (20, 0.138), (27, 0.247), (37, 0.011), (38, 0.056), (48, 0.014), (53, 0.083), (55, 0.067), (64, 0.035), (77, 0.025), (92, 0.015), (94, 0.115), (95, 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9545275 208 hunch net-2006-09-18-What is missing for online collaborative research?

Introduction: The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader. But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research a

2 0.94149852 7 hunch net-2005-01-31-Watchword: Assumption

Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no

same-blog 3 0.94038433 351 hunch net-2009-05-02-Wielding a New Abstraction

4 0.92205745 190 hunch net-2006-07-06-Branch Prediction Competition

Introduction: Alan Fern points out the second branch prediction challenge (due September 29) which is a follow up to the first branch prediction competition . Branch prediction is one of the fundamental learning problems of the computer age: without it our computers might run an order of magnitude slower. This is a tough problem since there are sharp constraints on time and space complexity in an online environment. For machine learning, the “idealistic track” may fit well. Essentially, they remove these constraints to gain a weak upper bound on what might be done.

5 0.89192963 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

6 0.88774848 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

7 0.8823474 464 hunch net-2012-05-03-Microsoft Research, New York City

8 0.88184655 235 hunch net-2007-03-03-All Models of Learning have Flaws

9 0.88075978 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

10 0.88025928 95 hunch net-2005-07-14-What Learning Theory might do

11 0.87772745 237 hunch net-2007-04-02-Contextual Scaling

12 0.87594068 371 hunch net-2009-09-21-Netflix finishes (and starts)

13 0.87539071 41 hunch net-2005-03-15-The State of Tight Bounds

14 0.87502766 347 hunch net-2009-03-26-Machine Learning is too easy

15 0.87472373 183 hunch net-2006-06-14-Explorations of Exploration

16 0.87448823 370 hunch net-2009-09-18-Necessary and Sufficient Research

17 0.87430096 360 hunch net-2009-06-15-In Active Learning, the question changes

18 0.87421221 258 hunch net-2007-08-12-Exponentiated Gradient

19 0.87419051 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

20 0.87413943 143 hunch net-2005-12-27-Automated Labeling