hunch_net hunch_net-2005 hunch_net-2005-55 knowledge-graph by maker-knowledge-mining

55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?


meta infos for this blog

Source: html

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Steve Smale and I have a debate about goals of learning theory. [sent-1, score-0.227]

2 Steve likes theorems with a dependence on unobservable quantities. [sent-2, score-0.292]

3 For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . [sent-3, score-0.173]

4 I dislike this, because I want to use the theorems to produce code solving learning problems. [sent-4, score-0.401]

5 Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. [sent-5, score-1.082]

6 Recast more broadly, this is a debate between “declarative” and “operative” mathematics. [sent-6, score-0.16]

7 A strong example of “declarative” mathematics is “a new kind of science” . [sent-7, score-0.573]

8 Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. [sent-8, score-0.272]

9 Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions about the world. [sent-9, score-0.693]

10 A strong example of operative mathematics is Newtonian mechanics in physics: it’s a great tool to help you predict what is going to happen in the world. [sent-10, score-1.379]

11 In addition to the “I want to do things” motivation for operative mathematics, I find it less arbitrary. [sent-11, score-0.812]

12 In particular, two reasonable people can each be convinced they understand a topic in ways so different that they do not understand the viewpoint. [sent-12, score-0.296]

13 If these understandings are operative, the rest of us on the sidelines can better appreciate which understanding is “best”. [sent-13, score-0.236]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('operative', 0.62), ('variance', 0.297), ('mathematics', 0.291), ('declarative', 0.248), ('steve', 0.17), ('debate', 0.16), ('kind', 0.132), ('theorems', 0.126), ('newtonian', 0.11), ('plug', 0.11), ('smale', 0.11), ('understandings', 0.11), ('theorem', 0.106), ('dislike', 0.096), ('likes', 0.096), ('mechanics', 0.096), ('convinced', 0.092), ('help', 0.089), ('transition', 0.088), ('phase', 0.088), ('strong', 0.087), ('explain', 0.078), ('physics', 0.076), ('tool', 0.076), ('broadly', 0.074), ('understand', 0.074), ('motivation', 0.071), ('dependence', 0.07), ('goals', 0.067), ('speaking', 0.067), ('dependent', 0.067), ('appreciate', 0.066), ('depending', 0.065), ('want', 0.064), ('roughly', 0.064), ('example', 0.063), ('observations', 0.062), ('produce', 0.061), ('rest', 0.06), ('helps', 0.06), ('know', 0.059), ('things', 0.058), ('addition', 0.057), ('measure', 0.057), ('happen', 0.057), ('topic', 0.056), ('predictions', 0.055), ('code', 0.054), ('include', 0.053), ('exists', 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

2 0.1768716 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

3 0.16690059 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

4 0.11198721 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

5 0.092084922 70 hunch net-2005-05-12-Math on the Web

Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de

6 0.089464471 133 hunch net-2005-11-28-A question of quantification

7 0.080843166 392 hunch net-2010-03-26-A Variance only Deviation Bound

8 0.077891313 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

9 0.071979769 22 hunch net-2005-02-18-What it means to do research.

10 0.067544818 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates

11 0.065705039 169 hunch net-2006-04-05-What is state?

12 0.064715788 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

13 0.058245812 347 hunch net-2009-03-26-Machine Learning is too easy

14 0.057953507 388 hunch net-2010-01-24-Specializations of the Master Problem

15 0.057904698 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

16 0.057685617 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

17 0.055749085 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

18 0.054813236 466 hunch net-2012-06-05-ICML acceptance statistics

19 0.053998992 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

20 0.051313564 358 hunch net-2009-06-01-Multitask Poisoning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.124), (1, 0.042), (2, -0.003), (3, 0.028), (4, -0.006), (5, -0.041), (6, 0.059), (7, 0.027), (8, 0.0), (9, -0.003), (10, 0.02), (11, -0.005), (12, 0.043), (13, -0.029), (14, 0.004), (15, -0.021), (16, -0.028), (17, 0.014), (18, 0.015), (19, 0.054), (20, 0.016), (21, 0.072), (22, -0.094), (23, 0.017), (24, -0.032), (25, -0.006), (26, 0.001), (27, 0.005), (28, -0.084), (29, -0.0), (30, -0.093), (31, 0.08), (32, 0.02), (33, 0.035), (34, -0.115), (35, -0.014), (36, -0.03), (37, 0.059), (38, 0.014), (39, -0.029), (40, -0.055), (41, -0.038), (42, -0.118), (43, 0.053), (44, -0.074), (45, 0.005), (46, 0.026), (47, 0.043), (48, -0.037), (49, -0.102)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9506889 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

2 0.70254725 70 hunch net-2005-05-12-Math on the Web

Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de

3 0.69188213 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

4 0.64918572 392 hunch net-2010-03-26-A Variance only Deviation Bound

Introduction: At the PAC-Bayes workshop earlier this week, Olivier Catoni described a result that I hadn’t believed was possible: a deviation bound depending only on the variance of a random variable . For people not familiar with deviation bounds, this may be hard to appreciate. Deviation bounds, are one of the core components for the foundations of machine learning theory, so developments here have a potential to alter our understanding of how to learn and what is learnable. My understanding is that the basic proof techniques started with Bernstein and have evolved into several variants specialized for various applications. All of the variants I knew had a dependence on the range, with some also having a dependence on the variance of an IID or martingale random variable. This one is the first I know of with a dependence on only the variance. The basic idea is to use a biased estimator of the mean which is not influenced much by outliers. Then, a deviation bound can be proved by

5 0.64908284 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

6 0.61902964 90 hunch net-2005-07-07-The Limits of Learning Theory

7 0.54685968 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

8 0.52607864 202 hunch net-2006-08-10-Precision is not accuracy

9 0.50056404 162 hunch net-2006-03-09-Use of Notation

10 0.48971811 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

11 0.48774603 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

12 0.4755559 170 hunch net-2006-04-06-Bounds greater than 1

13 0.47301817 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

14 0.44749543 218 hunch net-2006-11-20-Context and the calculation misperception

15 0.43240005 348 hunch net-2009-04-02-Asymmophobia

16 0.41965151 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

17 0.41808203 104 hunch net-2005-08-22-Do you believe in induction?

18 0.41685835 62 hunch net-2005-04-26-To calibrate or not?

19 0.41230863 328 hunch net-2008-11-26-Efficient Reinforcement Learning in MDPs

20 0.41145325 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.04), (10, 0.403), (27, 0.153), (38, 0.016), (53, 0.088), (55, 0.12), (94, 0.045), (95, 0.01)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98105353 474 hunch net-2012-10-18-7th Annual Machine Learning Symposium

Introduction: A reminder that the New York Academy of Sciences will be hosting the  7th Annual Machine Learning Symposium tomorrow from 9:30am. The main program will feature invited talks from Peter Bartlett ,  William Freeman , and Vladimir Vapnik , along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly , Buzzfeed , Chartbeat , and Sense Networks , Visual Revenue ). This should be of great interest to everyone considering working in machine learning.

2 0.95648235 38 hunch net-2005-03-09-Bad Reviewing

Introduction: This is a difficult subject to talk about for many reasons, but a discussion may be helpful. Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing. Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT) Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS) Reviewer believes a technique will not work despite experimental validation. (COLT) Reviewers fail to notice flaw in theorem statement (CRYPTO). Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!) Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2) Reviewer fails to distinguish between a DAG and a tree (SODA). Reviewer is enthusiastic about paper but clearly does not understand (ICML). Reviewer erroneously

same-blog 3 0.91195858 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

4 0.8976441 199 hunch net-2006-07-26-Two more UAI papers of interest

Introduction: In addition to Ed Snelson’s paper, there were (at least) two other papers that caught my eye at UAI. One was this paper by Sanjoy Dasgupta, Daniel Hsu and Nakul Verma at UCSD which shows in a surprisingly general and strong way that almost all linear projections of any jointly distributed vector random variable with finite first and second moments look sphereical and unimodal (in fact look like a scale mixture of Gaussians). Great result, as you’d expect from Sanjoy. The other paper which I found intriguing but which I just haven’t groked yet is this beast by Manfred and Dima Kuzmin. You can check out the (beautiful) slides if that helps. I feel like there is something deep here, but my brain is too small to understand it. The COLT and last NIPS papers/slides are also on Manfred’s page. Hopefully someone here can illuminate.

5 0.88946271 434 hunch net-2011-05-09-CI Fellows, again

Introduction: Lev and Hal point out the CI Fellows program is on again for this year. Lev visited me for a year under this program, and I quite enjoyed it. Due May 31.

6 0.85412276 182 hunch net-2006-06-05-Server Shift, Site Tweaks, Suggestions?

7 0.82617527 240 hunch net-2007-04-21-Videolectures.net

8 0.68872428 454 hunch net-2012-01-30-ICML Posters and Scope

9 0.66793787 332 hunch net-2008-12-23-Use of Learning Theory

10 0.57041562 207 hunch net-2006-09-12-Incentive Compatible Reviewing

11 0.55027795 437 hunch net-2011-07-10-ICML 2011 and the future

12 0.54369175 40 hunch net-2005-03-13-Avoiding Bad Reviewing

13 0.54252893 343 hunch net-2009-02-18-Decision by Vetocracy

14 0.54114604 484 hunch net-2013-06-16-Representative Reviewing

15 0.53358442 363 hunch net-2009-07-09-The Machine Learning Forum

16 0.52615631 354 hunch net-2009-05-17-Server Update

17 0.52143532 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

18 0.51886529 78 hunch net-2005-06-06-Exact Online Learning for Classification

19 0.51884615 5 hunch net-2005-01-26-Watchword: Probability

20 0.51854545 461 hunch net-2012-04-09-ICML author feedback is open