hunch_net hunch_net-2005 hunch_net-2005-90 knowledge-graph by maker-knowledge-mining

90 hunch net-2005-07-07-The Limits of Learning Theory


meta infos for this blog

Source: html

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. [sent-1, score-0.702]

2 This answer is both obvious and sometimes underappreciated. [sent-4, score-0.268]

3 There are several ways to conclude that some bias is necessary in order to succesfully learn. [sent-5, score-0.608]

4 For example, suppose we are trying to solve classification. [sent-6, score-0.269]

5 At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . [sent-7, score-0.288]

6 Bias is what makes us prefer one answer over the other based on past experience. [sent-8, score-0.371]

7 Always predicting 0 is as likely as 1 is useless. [sent-10, score-0.104]

8 Predicting 1 when the answer is 0 is also not helpful. [sent-12, score-0.268]

9 The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. [sent-13, score-0.228]

10 The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined with respect to the solutions to problems encountered in the real world. [sent-14, score-0.769]

11 The same effect occurs in various sciences such as physics—a mathematician can not solve physics because the “right” answer is defined by the world. [sent-15, score-1.272]

12 A similar question is “Can an entirely empirical approach solve machine learning? [sent-16, score-0.471]

13 The answer to this is “yes”, as long as we accept the evolution of humans and that a “solution” to machine learning is human-level learning ability. [sent-18, score-0.632]

14 A related question is then “Is mathematics useful in solving machine learning? [sent-19, score-0.578]

15 Although mathematics can not tell us what the “right” bias is, it can: Give us computational shortcuts relevant to machine learning. [sent-21, score-1.027]

16 Abstract empirical observations of what an empirically good bias is allowing transference to new domains. [sent-22, score-0.61]

17 There is a reasonable hope that solving mathematics related to learning implies we can reach a good machine learning system in time shorter than the evolution of a human. [sent-23, score-0.944]

18 All of these observations imply that the process of solving machine learning must be partially empirical. [sent-24, score-0.44]

19 ) Anyone hoping to do so must either engage in real-world experiments or listen carefully to people who engage in real-world experiments. [sent-26, score-0.764]

20 A reasonable model here is physics which has benefited from a combined mathematical and empirical study. [sent-27, score-0.606]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mathematician', 0.354), ('bias', 0.341), ('answer', 0.268), ('physics', 0.217), ('mathematics', 0.208), ('engage', 0.194), ('evolution', 0.194), ('implication', 0.157), ('right', 0.153), ('empirical', 0.151), ('solve', 0.147), ('suppose', 0.122), ('defined', 0.12), ('observations', 0.118), ('solving', 0.117), ('yes', 0.112), ('machine', 0.107), ('infinitely', 0.105), ('listen', 0.105), ('sitting', 0.105), ('predicting', 0.104), ('us', 0.103), ('must', 0.098), ('sciences', 0.097), ('conclude', 0.097), ('benefited', 0.097), ('shortcuts', 0.097), ('shorter', 0.097), ('succesfully', 0.087), ('hoping', 0.087), ('either', 0.086), ('order', 0.083), ('related', 0.08), ('proving', 0.074), ('reach', 0.072), ('possibilities', 0.072), ('observe', 0.072), ('combined', 0.072), ('encountered', 0.071), ('uniform', 0.071), ('occurs', 0.069), ('reasonable', 0.069), ('tell', 0.068), ('fails', 0.067), ('question', 0.066), ('prediction', 0.065), ('study', 0.064), ('powerful', 0.064), ('humans', 0.063), ('abstract', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000004 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

2 0.17216493 95 hunch net-2005-07-14-What Learning Theory might do

Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a

3 0.17056786 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to

4 0.15683612 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

5 0.13040477 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

Introduction: Textbooks invariably seem to carry the proof that uses Markov’s inequality, moment-generating functions, and Taylor approximations. Here’s an easier way. For , let be the KL divergence between a coin of bias and one of bias : Theorem: Suppose you do independent tosses of a coin of bias . The probability of seeing heads or more, for , is at most . So is the probability of seeing heads or less, for . Remark: By Pinsker’s inequality, . Proof Let’s do the case; the other is identical. Let be the distribution over induced by a coin of bias , and likewise for a coin of bias . Let be the set of all sequences of tosses which contain heads or more. We’d like to show that is unlikely under . Pick any , with say heads. Then: Since for every , we have and we’re done.

6 0.11327409 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

7 0.11198721 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

8 0.10717429 104 hunch net-2005-08-22-Do you believe in induction?

9 0.10087525 454 hunch net-2012-01-30-ICML Posters and Scope

10 0.10039937 370 hunch net-2009-09-18-Necessary and Sufficient Research

11 0.099542245 34 hunch net-2005-03-02-Prior, “Prior” and Bias

12 0.099151514 351 hunch net-2009-05-02-Wielding a New Abstraction

13 0.095981322 358 hunch net-2009-06-01-Multitask Poisoning

14 0.095508635 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

15 0.094816215 12 hunch net-2005-02-03-Learning Theory, by assumption

16 0.094015211 360 hunch net-2009-06-15-In Active Learning, the question changes

17 0.091563955 194 hunch net-2006-07-11-New Models

18 0.091104634 168 hunch net-2006-04-02-Mad (Neuro)science

19 0.088695318 6 hunch net-2005-01-27-Learning Complete Problems

20 0.088550746 235 hunch net-2007-03-03-All Models of Learning have Flaws


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.206), (1, 0.058), (2, -0.018), (3, 0.054), (4, -0.019), (5, -0.066), (6, 0.046), (7, 0.078), (8, 0.103), (9, -0.05), (10, -0.075), (11, -0.068), (12, 0.056), (13, 0.047), (14, 0.007), (15, 0.016), (16, -0.001), (17, -0.073), (18, 0.052), (19, -0.001), (20, -0.027), (21, 0.028), (22, 0.017), (23, -0.003), (24, 0.045), (25, -0.062), (26, 0.076), (27, -0.024), (28, -0.089), (29, 0.034), (30, -0.011), (31, 0.09), (32, -0.006), (33, 0.013), (34, -0.133), (35, -0.006), (36, -0.046), (37, 0.106), (38, -0.022), (39, -0.03), (40, -0.064), (41, 0.002), (42, -0.185), (43, 0.102), (44, 0.041), (45, -0.056), (46, 0.027), (47, -0.017), (48, -0.087), (49, -0.1)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95137805 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

2 0.74739617 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to

3 0.71800983 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

4 0.70970929 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

5 0.66542464 168 hunch net-2006-04-02-Mad (Neuro)science

Introduction: One of the questions facing machine learning as a field is “Can we produce a generalized learning system that can solve a wide array of standard learning problems?” The answer is trivial: “yes, just have children”. Of course, that wasn’t really the question. The refined question is “Are there simple-to-implement generalized learning systems that can solve a wide array of standard learning problems?” The answer to this is less clear. The ability of animals (and people ) to learn might be due to megabytes encoded in the DNA. If this algorithmic complexity is necessary to solve machine learning, the field faces a daunting task in replicating it on a computer. This observation suggests a possibility: if you can show that few bits of DNA are needed for learning in animals, then this provides evidence that machine learning (as a field) has a hope of big success with relatively little effort. It is well known that specific portions of the brain have specific functionality across

6 0.62932569 104 hunch net-2005-08-22-Do you believe in induction?

7 0.60880911 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

8 0.60792023 164 hunch net-2006-03-17-Multitask learning is Black-Boxable

9 0.6011709 68 hunch net-2005-05-10-Learning Reductions are Reductionist

10 0.56950587 70 hunch net-2005-05-12-Math on the Web

11 0.56136477 95 hunch net-2005-07-14-What Learning Theory might do

12 0.55351782 62 hunch net-2005-04-26-To calibrate or not?

13 0.54211926 257 hunch net-2007-07-28-Asking questions

14 0.54045862 348 hunch net-2009-04-02-Asymmophobia

15 0.53224111 161 hunch net-2006-03-05-“Structural” Learning

16 0.5318495 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

17 0.52883178 347 hunch net-2009-03-26-Machine Learning is too easy

18 0.52499342 358 hunch net-2009-06-01-Multitask Poisoning

19 0.52418905 37 hunch net-2005-03-08-Fast Physics for Learning

20 0.52275831 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.152), (38, 0.024), (53, 0.058), (55, 0.635), (94, 0.015), (95, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99612236 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge

Introduction: The results have been posted , with CMU first , Stanford second , and Virginia Tech Third . Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops . Slashdot has a post with many comments.

2 0.99370915 446 hunch net-2011-10-03-Monday announcements

Introduction: Various people want to use hunch.net to announce things. I’ve generally resisted this because I feared hunch becoming a pure announcement zone while I am much more interested contentful posts and discussion personally. Nevertheless there is clearly some value and announcements are easy, so I’m planning to summarize announcements on Mondays. D. Sculley points out an interesting Semisupervised feature learning competition, with a deadline of October 17. Lihong Li points out the webscope user interaction dataset which is the first high quality exploration dataset I’m aware of that is publicly available. Seth Rogers points out CrossValidated which looks similar in conception to metaoptimize , but directly using the stackoverflow interface and with a bit more of a statistics twist.

3 0.99364978 20 hunch net-2005-02-15-ESPgame and image labeling

Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make

4 0.99084121 472 hunch net-2012-08-27-NYAS ML 2012 and ICML 2013

Introduction: The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year. The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing. Edit: Fixed second ICML deadline.

5 0.99066365 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

6 0.99044985 448 hunch net-2011-10-24-2011 ML symposium and the bears

same-blog 7 0.97731644 90 hunch net-2005-07-07-The Limits of Learning Theory

8 0.96766967 326 hunch net-2008-11-11-COLT CFP

9 0.96766967 465 hunch net-2012-05-12-ICML accepted papers and early registration

10 0.96052355 331 hunch net-2008-12-12-Summer Conferences

11 0.93432182 270 hunch net-2007-11-02-The Machine Learning Award goes to …

12 0.9188602 395 hunch net-2010-04-26-Compassionate Reviewing

13 0.91713291 387 hunch net-2010-01-19-Deadline Season, 2010

14 0.89927137 453 hunch net-2012-01-28-Why COLT?

15 0.86880291 65 hunch net-2005-05-02-Reviewing techniques for conferences

16 0.85585302 356 hunch net-2009-05-24-2009 ICML discussion site

17 0.83221412 457 hunch net-2012-02-29-Key Scientific Challenges and the Franklin Symposium

18 0.82998103 452 hunch net-2012-01-04-Why ICML? and the summer conferences

19 0.81825835 46 hunch net-2005-03-24-The Role of Workshops

20 0.81497538 216 hunch net-2006-11-02-2006 NIPS workshops