hunch_net hunch_net-2008 hunch_net-2008-302 knowledge-graph by maker-knowledge-mining

302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning


meta infos for this blog

Source: html

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . [sent-1, score-1.135]

2 An event which happens with probability 1 and an event that always happens. [sent-3, score-0.747]

3 I don’t appreciate this distinction in machine learning & learning theory. [sent-4, score-0.456]

4 All machine learning takes place (by definition) on a machine where every parameter has finite precision. [sent-5, score-0.66]

5 Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. [sent-6, score-1.065]

6 The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. [sent-7, score-1.771]

7 We can ask “what mathematics is appropriate to computation? [sent-9, score-0.698]

8 ” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . [sent-10, score-0.905]

9 So, here’s a basic challenge: Can anyone name a situation where any of the distinctions above (or similar distinctions) matter in machine learning? [sent-11, score-0.76]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mathematics', 0.528), ('distinction', 0.267), ('distinctions', 0.267), ('closed', 0.256), ('event', 0.183), ('always', 0.169), ('computation', 0.157), ('andrej', 0.148), ('maximal', 0.14), ('mismatched', 0.14), ('probability', 0.138), ('convinced', 0.133), ('concerned', 0.133), ('set', 0.126), ('physical', 0.124), ('every', 0.117), ('constructive', 0.113), ('parts', 0.11), ('element', 0.108), ('concerns', 0.103), ('parameter', 0.101), ('maximum', 0.101), ('events', 0.098), ('name', 0.098), ('appreciate', 0.096), ('consequently', 0.095), ('finite', 0.093), ('machine', 0.093), ('situation', 0.092), ('appropriate', 0.089), ('takes', 0.084), ('matter', 0.084), ('challenge', 0.083), ('students', 0.082), ('definition', 0.081), ('ask', 0.081), ('issue', 0.079), ('place', 0.079), ('greatly', 0.077), ('exists', 0.074), ('happens', 0.074), ('appear', 0.074), ('reviewers', 0.074), ('pretty', 0.073), ('fundamental', 0.068), ('open', 0.068), ('anyone', 0.066), ('world', 0.06), ('similar', 0.06), ('answer', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

2 0.1768716 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

3 0.17422426 70 hunch net-2005-05-12-Math on the Web

Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de

4 0.15683612 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

5 0.1039708 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

6 0.091955602 218 hunch net-2006-11-20-Context and the calculation misperception

7 0.090348244 358 hunch net-2009-06-01-Multitask Poisoning

8 0.08355955 290 hunch net-2008-02-27-The Stats Handicap

9 0.083050951 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

10 0.080315091 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

11 0.078457803 222 hunch net-2006-12-05-Recruitment Conferences

12 0.071377739 43 hunch net-2005-03-18-Binomial Weighting

13 0.066222683 110 hunch net-2005-09-10-“Failure” is an option

14 0.063952617 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

15 0.062624745 410 hunch net-2010-09-17-New York Area Machine Learning Events

16 0.061458956 169 hunch net-2006-04-05-What is state?

17 0.059605669 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

18 0.059129514 347 hunch net-2009-03-26-Machine Learning is too easy

19 0.056993075 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

20 0.056387432 330 hunch net-2008-12-07-A NIPS paper


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.13), (1, 0.016), (2, -0.016), (3, 0.039), (4, -0.045), (5, -0.018), (6, 0.012), (7, 0.047), (8, 0.016), (9, -0.054), (10, -0.024), (11, -0.005), (12, 0.063), (13, -0.012), (14, 0.01), (15, -0.0), (16, -0.039), (17, 0.003), (18, 0.115), (19, 0.125), (20, -0.046), (21, 0.097), (22, 0.021), (23, 0.081), (24, 0.038), (25, 0.008), (26, 0.031), (27, -0.022), (28, -0.058), (29, 0.092), (30, 0.02), (31, 0.18), (32, 0.004), (33, 0.074), (34, -0.131), (35, -0.002), (36, -0.064), (37, 0.105), (38, 0.118), (39, -0.002), (40, -0.064), (41, -0.067), (42, -0.116), (43, 0.077), (44, -0.092), (45, -0.023), (46, 0.052), (47, -0.04), (48, -0.082), (49, -0.018)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96083176 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

2 0.68237704 70 hunch net-2005-05-12-Math on the Web

Introduction: Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks: MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to de

3 0.64956015 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

4 0.621108 62 hunch net-2005-04-26-To calibrate or not?

Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b

5 0.59064907 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

6 0.5858258 5 hunch net-2005-01-26-Watchword: Probability

7 0.5224905 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

8 0.51215321 218 hunch net-2006-11-20-Context and the calculation misperception

9 0.50085682 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

10 0.45150891 118 hunch net-2005-10-07-On-line learning of regular decision rules

11 0.44623962 123 hunch net-2005-10-16-Complexity: It’s all in your head

12 0.44511923 330 hunch net-2008-12-07-A NIPS paper

13 0.43315607 290 hunch net-2008-02-27-The Stats Handicap

14 0.39394328 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

15 0.38498721 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops

16 0.3807714 440 hunch net-2011-08-06-Interesting thing at UAI 2011

17 0.37427446 39 hunch net-2005-03-10-Breaking Abstractions

18 0.37114301 33 hunch net-2005-02-28-Regularization

19 0.36930546 257 hunch net-2007-07-28-Asking questions

20 0.36318958 112 hunch net-2005-09-14-The Predictionist Viewpoint


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.132), (55, 0.701), (94, 0.022), (95, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99747753 472 hunch net-2012-08-27-NYAS ML 2012 and ICML 2013

Introduction: The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year. The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing. Edit: Fixed second ICML deadline.

2 0.99478483 446 hunch net-2011-10-03-Monday announcements

Introduction: Various people want to use hunch.net to announce things. I’ve generally resisted this because I feared hunch becoming a pure announcement zone while I am much more interested contentful posts and discussion personally. Nevertheless there is clearly some value and announcements are easy, so I’m planning to summarize announcements on Mondays. D. Sculley points out an interesting Semisupervised feature learning competition, with a deadline of October 17. Lihong Li points out the webscope user interaction dataset which is the first high quality exploration dataset I’m aware of that is publicly available. Seth Rogers points out CrossValidated which looks similar in conception to metaoptimize , but directly using the stackoverflow interface and with a bit more of a statistics twist.

3 0.99400276 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge

Introduction: The results have been posted , with CMU first , Stanford second , and Virginia Tech Third . Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops . Slashdot has a post with many comments.

same-blog 4 0.98953962 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

Introduction: Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set . A Supremum and a Maximum . An event which happens with probability 1 and an event that always happens. I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen. The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. This mismatched mathematics makes irrelevant distinctions. We can ask “what mathematics is appropriate to computation?” Andrej has convinced me that a pretty good answer to this question is constructive mathematics . So, here’s a basic challenge: Can anyone name a situati

5 0.98704875 20 hunch net-2005-02-15-ESPgame and image labeling

Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make

6 0.98591465 448 hunch net-2011-10-24-2011 ML symposium and the bears

7 0.98217386 326 hunch net-2008-11-11-COLT CFP

8 0.98217386 465 hunch net-2012-05-12-ICML accepted papers and early registration

9 0.95741212 90 hunch net-2005-07-07-The Limits of Learning Theory

10 0.93449557 331 hunch net-2008-12-12-Summer Conferences

11 0.91058165 270 hunch net-2007-11-02-The Machine Learning Award goes to …

12 0.89603591 387 hunch net-2010-01-19-Deadline Season, 2010

13 0.89034742 395 hunch net-2010-04-26-Compassionate Reviewing

14 0.88012409 453 hunch net-2012-01-28-Why COLT?

15 0.8475765 65 hunch net-2005-05-02-Reviewing techniques for conferences

16 0.82349151 356 hunch net-2009-05-24-2009 ICML discussion site

17 0.82061225 457 hunch net-2012-02-29-Key Scientific Challenges and the Franklin Symposium

18 0.81045133 216 hunch net-2006-11-02-2006 NIPS workshops

19 0.79569721 46 hunch net-2005-03-24-The Role of Workshops

20 0.78752017 452 hunch net-2012-01-04-Why ICML? and the summer conferences