hunch_net hunch_net-2005 hunch_net-2005-55 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a
sentIndex sentText sentNum sentScore
1 Steve Smale and I have a debate about goals of learning theory. [sent-1, score-0.227]
2 Steve likes theorems with a dependence on unobservable quantities. [sent-2, score-0.292]
3 For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . [sent-3, score-0.173]
4 I dislike this, because I want to use the theorems to produce code solving learning problems. [sent-4, score-0.401]
5 Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. [sent-5, score-1.082]
6 Recast more broadly, this is a debate between “declarative” and “operative” mathematics. [sent-6, score-0.16]
7 A strong example of “declarative” mathematics is “a new kind of science” . [sent-7, score-0.573]
8 Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. [sent-8, score-0.272]
9 Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions about the world. [sent-9, score-0.693]
10 A strong example of operative mathematics is Newtonian mechanics in physics: it’s a great tool to help you predict what is going to happen in the world. [sent-10, score-1.379]
11 In addition to the “I want to do things” motivation for operative mathematics, I find it less arbitrary. [sent-11, score-0.812]
12 In particular, two reasonable people can each be convinced they understand a topic in ways so different that they do not understand the viewpoint. [sent-12, score-0.296]
13 If these understandings are operative, the rest of us on the sidelines can better appreciate which understanding is “best”. [sent-13, score-0.236]
wordName wordTfidf (topN-words)
[('operative', 0.62), ('variance', 0.297), ('mathematics', 0.291), ('declarative', 0.248), ('steve', 0.17), ('debate', 0.16), ('kind', 0.132), ('theorems', 0.126), ('newtonian', 0.11), ('plug', 0.11), ('smale', 0.11), ('understandings', 0.11), ('theorem', 0.106), ('dislike', 0.096), ('likes', 0.096), ('mechanics', 0.096), ('convinced', 0.092), ('help', 0.089), ('transition', 0.088), ('phase', 0.088), ('strong', 0.087), ('explain', 0.078), ('physics', 0.076), ('tool', 0.076), ('broadly', 0.074), ('understand', 0.074), ('motivation', 0.071), ('dependence', 0.07), ('goals', 0.067), ('speaking', 0.067), ('dependent', 0.067), ('appreciate', 0.066), ('depending', 0.065), ('want', 0.064), ('roughly', 0.064), ('example', 0.063), ('observations', 0.062), ('produce', 0.061), ('rest', 0.06), ('helps', 0.06), ('know', 0.059), ('things', 0.058), ('addition', 0.057), ('measure', 0.057), ('happen', 0.057), ('topic', 0.056), ('predictions', 0.055), ('code', 0.054), ('include', 0.053), ('exists', 0.051)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?
Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a
2 0.1768716 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning
Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati
3 0.16690059 244 hunch net-2007-05-09-The Missing Bound
Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,
4 0.11198721 90 hunch net-2005-07-07-The Limits of Learning Theory
Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi
5 0.092084922 70 hunch net-2005-05-12-Math on the Web
Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de
6 0.089464471 133 hunch net-2005-11-28-A question of quantification
7 0.080843166 392 hunch net-2010-03-26-A Variance only Deviation Bound
8 0.077891313 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
9 0.071979769 22 hunch net-2005-02-18-What it means to do research.
10 0.067544818 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
11 0.065705039 169 hunch net-2006-04-05-What is state?
12 0.064715788 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling
13 0.058245812 347 hunch net-2009-03-26-Machine Learning is too easy
14 0.057953507 388 hunch net-2010-01-24-Specializations of the Master Problem
15 0.057904698 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
16 0.057685617 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
17 0.055749085 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
18 0.054813236 466 hunch net-2012-06-05-ICML acceptance statistics
19 0.053998992 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
20 0.051313564 358 hunch net-2009-06-01-Multitask Poisoning
topicId topicWeight
[(0, 0.124), (1, 0.042), (2, -0.003), (3, 0.028), (4, -0.006), (5, -0.041), (6, 0.059), (7, 0.027), (8, 0.0), (9, -0.003), (10, 0.02), (11, -0.005), (12, 0.043), (13, -0.029), (14, 0.004), (15, -0.021), (16, -0.028), (17, 0.014), (18, 0.015), (19, 0.054), (20, 0.016), (21, 0.072), (22, -0.094), (23, 0.017), (24, -0.032), (25, -0.006), (26, 0.001), (27, 0.005), (28, -0.084), (29, -0.0), (30, -0.093), (31, 0.08), (32, 0.02), (33, 0.035), (34, -0.115), (35, -0.014), (36, -0.03), (37, 0.059), (38, 0.014), (39, -0.029), (40, -0.055), (41, -0.038), (42, -0.118), (43, 0.053), (44, -0.074), (45, 0.005), (46, 0.026), (47, 0.043), (48, -0.037), (49, -0.102)]
simIndex simValue blogId blogTitle
same-blog 1 0.9506889 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?
Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a
2 0.70254725 70 hunch net-2005-05-12-Math on the Web
Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de
3 0.69188213 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning
Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati
4 0.64918572 392 hunch net-2010-03-26-A Variance only Deviation Bound
Introduction: At the PAC-Bayes workshop earlier this week, Olivier Catoni described a result that I hadn’t believed was possible: a deviation bound depending only on the variance of a random variable . For people not familiar with deviation bounds, this may be hard to appreciate. Deviation bounds, are one of the core components for the foundations of machine learning theory, so developments here have a potential to alter our understanding of how to learn and what is learnable. My understanding is that the basic proof techniques started with Bernstein and have evolved into several variants specialized for various applications. All of the variants I knew had a dependence on the range, with some also having a dependence on the variance of an IID or martingale random variable. This one is the first I know of with a dependence on only the variance. The basic idea is to use a biased estimator of the mean which is not influenced much by outliers. Then, a deviation bound can be proved by
5 0.64908284 244 hunch net-2007-05-09-The Missing Bound
Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,
6 0.61902964 90 hunch net-2005-07-07-The Limits of Learning Theory
7 0.54685968 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling
8 0.52607864 202 hunch net-2006-08-10-Precision is not accuracy
9 0.50056404 162 hunch net-2006-03-09-Use of Notation
10 0.48971811 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound
11 0.48774603 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
12 0.4755559 170 hunch net-2006-04-06-Bounds greater than 1
13 0.47301817 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
14 0.44749543 218 hunch net-2006-11-20-Context and the calculation misperception
15 0.43240005 348 hunch net-2009-04-02-Asymmophobia
16 0.41965151 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
17 0.41808203 104 hunch net-2005-08-22-Do you believe in induction?
18 0.41685835 62 hunch net-2005-04-26-To calibrate or not?
19 0.41230863 328 hunch net-2008-11-26-Efficient Reinforcement Learning in MDPs
20 0.41145325 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
topicId topicWeight
[(3, 0.04), (10, 0.403), (27, 0.153), (38, 0.016), (53, 0.088), (55, 0.12), (94, 0.045), (95, 0.01)]
simIndex simValue blogId blogTitle
1 0.98105353 474 hunch net-2012-10-18-7th Annual Machine Learning Symposium
Introduction: A reminder that the New York Academy of Sciences will be hosting the 7th Annual Machine Learning Symposium tomorrow from 9:30am. The main program will feature invited talks from Peter Bartlett , William Freeman , and Vladimir Vapnik , along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly , Buzzfeed , Chartbeat , and Sense Networks , Visual Revenue ). This should be of great interest to everyone considering working in machine learning.
2 0.95648235 38 hunch net-2005-03-09-Bad Reviewing
Introduction: This is a difficult subject to talk about for many reasons, but a discussion may be helpful. Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing. Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT) Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS) Reviewer believes a technique will not work despite experimental validation. (COLT) Reviewers fail to notice flaw in theorem statement (CRYPTO). Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!) Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2) Reviewer fails to distinguish between a DAG and a tree (SODA). Reviewer is enthusiastic about paper but clearly does not understand (ICML). Reviewer erroneously
same-blog 3 0.91195858 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?
Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a
4 0.8976441 199 hunch net-2006-07-26-Two more UAI papers of interest
Introduction: In addition to Ed Snelson’s paper, there were (at least) two other papers that caught my eye at UAI. One was this paper by Sanjoy Dasgupta, Daniel Hsu and Nakul Verma at UCSD which shows in a surprisingly general and strong way that almost all linear projections of any jointly distributed vector random variable with finite first and second moments look sphereical and unimodal (in fact look like a scale mixture of Gaussians). Great result, as you’d expect from Sanjoy. The other paper which I found intriguing but which I just haven’t groked yet is this beast by Manfred and Dima Kuzmin. You can check out the (beautiful) slides if that helps. I feel like there is something deep here, but my brain is too small to understand it. The COLT and last NIPS papers/slides are also on Manfred’s page. Hopefully someone here can illuminate.
5 0.88946271 434 hunch net-2011-05-09-CI Fellows, again
Introduction: Lev and Hal point out the CI Fellows program is on again for this year. Lev visited me for a year under this program, and I quite enjoyed it. Due May 31.
6 0.85412276 182 hunch net-2006-06-05-Server Shift, Site Tweaks, Suggestions?
7 0.82617527 240 hunch net-2007-04-21-Videolectures.net
8 0.68872428 454 hunch net-2012-01-30-ICML Posters and Scope
9 0.66793787 332 hunch net-2008-12-23-Use of Learning Theory
10 0.57041562 207 hunch net-2006-09-12-Incentive Compatible Reviewing
11 0.55027795 437 hunch net-2011-07-10-ICML 2011 and the future
12 0.54369175 40 hunch net-2005-03-13-Avoiding Bad Reviewing
13 0.54252893 343 hunch net-2009-02-18-Decision by Vetocracy
14 0.54114604 484 hunch net-2013-06-16-Representative Reviewing
15 0.53358442 363 hunch net-2009-07-09-The Machine Learning Forum
16 0.52615631 354 hunch net-2009-05-17-Server Update
17 0.52143532 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?
18 0.51886529 78 hunch net-2005-06-06-Exact Online Learning for Classification
19 0.51884615 5 hunch net-2005-01-26-Watchword: Probability
20 0.51854545 461 hunch net-2012-04-09-ICML author feedback is open