hunch_net hunch_net-2006 hunch_net-2006-176 knowledge-graph by maker-knowledge-mining

176 hunch net-2006-05-01-A conversation between Theo and Pat


meta infos for this blog

Source: html

Introduction: Pat (the practitioner) I need to do multiclass classification and I only have a decision tree. Theo (the thoeretician) Use an error correcting output code . Pat Oh, that’s cool. But the created binary problems seem unintuitive. I’m not sure the decision tree can solve them. Theo Oh? Is your problem a decision list? Pat No, I don’t think so. Theo Hmm. Are the classes well separated by axis aligned splits? Pat Err, maybe. I’m not sure. Theo Well, if they are, under the IID assumption I can tell you how many samples you need. Pat IID? The data is definitely not IID. Theo Oh dear. Pat Can we get back to the choice of ECOC? I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other. Theo Ok. What do you know about your problem? Pat Not much. My friend just gave me the dataset. Theo Then, no one can help you. Pat (What a fuzzy thinker. Theo keeps jumping t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Pat (the practitioner) I need to do multiclass classification and I only have a decision tree. [sent-1, score-0.134]

2 Theo (the thoeretician) Use an error correcting output code . [sent-2, score-0.072]

3 Are the classes well separated by axis aligned splits? [sent-10, score-0.142]

4 Theo Well, if they are, under the IID assumption I can tell you how many samples you need. [sent-13, score-0.068]

5 I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other. [sent-18, score-0.318]

6 Theo keeps jumping to assumptions that just aren’t true. [sent-25, score-0.109]

7 Pat’s problem is unsolvable without making extra assumptions. [sent-27, score-0.084]

8 ) I’ve heard variants of this conversation several times. [sent-28, score-0.126]

9 The fundamental difference in viewpoint is the following: Theo lives in a world where he chooses the problem to solve based upon learning model (and assumptions) used. [sent-29, score-0.327]

10 Pat lives in a world where the problem is imposed on him. [sent-30, score-0.223]

11 I’d love for these confusions to go away, but there is no magic wand. [sent-31, score-0.107]

12 The best advice seems to be: listen carefully and avoid assuming to much in what you hear. [sent-32, score-0.132]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('pat', 0.646), ('theo', 0.646), ('oh', 0.194), ('fuzzy', 0.129), ('lives', 0.089), ('iid', 0.062), ('assumptions', 0.061), ('decision', 0.059), ('listen', 0.057), ('magic', 0.057), ('axis', 0.053), ('aligned', 0.053), ('dynamically', 0.053), ('practitioner', 0.053), ('problem', 0.052), ('love', 0.05), ('ecoc', 0.05), ('subsets', 0.05), ('err', 0.048), ('keeps', 0.048), ('friend', 0.044), ('heard', 0.044), ('suspect', 0.043), ('conversation', 0.043), ('world', 0.043), ('need', 0.043), ('correcting', 0.041), ('chooses', 0.041), ('hear', 0.041), ('solve', 0.04), ('assuming', 0.04), ('definitely', 0.039), ('imposed', 0.039), ('variants', 0.039), ('tell', 0.037), ('classes', 0.036), ('gave', 0.036), ('advice', 0.035), ('away', 0.035), ('response', 0.034), ('extra', 0.032), ('labels', 0.032), ('multiclass', 0.032), ('viewpoint', 0.032), ('empirically', 0.032), ('assumption', 0.031), ('created', 0.031), ('output', 0.031), ('build', 0.031), ('upon', 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 176 hunch net-2006-05-01-A conversation between Theo and Pat

Introduction: Pat (the practitioner) I need to do multiclass classification and I only have a decision tree. Theo (the thoeretician) Use an error correcting output code . Pat Oh, that’s cool. But the created binary problems seem unintuitive. I’m not sure the decision tree can solve them. Theo Oh? Is your problem a decision list? Pat No, I don’t think so. Theo Hmm. Are the classes well separated by axis aligned splits? Pat Err, maybe. I’m not sure. Theo Well, if they are, under the IID assumption I can tell you how many samples you need. Pat IID? The data is definitely not IID. Theo Oh dear. Pat Can we get back to the choice of ECOC? I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other. Theo Ok. What do you know about your problem? Pat Not much. My friend just gave me the dataset. Theo Then, no one can help you. Pat (What a fuzzy thinker. Theo keeps jumping t

2 0.047789659 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

3 0.047321562 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

4 0.046823461 7 hunch net-2005-01-31-Watchword: Assumption

Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no

5 0.043601964 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

6 0.04351129 12 hunch net-2005-02-03-Learning Theory, by assumption

7 0.042897366 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

8 0.036528982 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

9 0.036474016 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

10 0.035763405 127 hunch net-2005-11-02-Progress in Active Learning

11 0.035276838 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

12 0.034849558 304 hunch net-2008-06-27-Reviewing Horror Stories

13 0.031468723 183 hunch net-2006-06-14-Explorations of Exploration

14 0.030123109 14 hunch net-2005-02-07-The State of the Reduction

15 0.029959016 347 hunch net-2009-03-26-Machine Learning is too easy

16 0.02911748 118 hunch net-2005-10-07-On-line learning of regular decision rules

17 0.028773673 67 hunch net-2005-05-06-Don’t mix the solution into the problem

18 0.027542043 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

19 0.026746484 351 hunch net-2009-05-02-Wielding a New Abstraction

20 0.0266222 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.062), (1, 0.035), (2, 0.012), (3, -0.0), (4, 0.015), (5, -0.009), (6, 0.044), (7, 0.001), (8, -0.023), (9, -0.002), (10, -0.018), (11, 0.002), (12, 0.023), (13, -0.007), (14, -0.0), (15, -0.02), (16, -0.028), (17, 0.011), (18, 0.001), (19, 0.013), (20, 0.012), (21, 0.015), (22, 0.021), (23, 0.005), (24, -0.007), (25, -0.014), (26, -0.003), (27, -0.037), (28, 0.009), (29, -0.01), (30, 0.024), (31, 0.013), (32, -0.026), (33, 0.034), (34, -0.012), (35, 0.029), (36, 0.02), (37, 0.012), (38, 0.014), (39, -0.034), (40, -0.015), (41, 0.057), (42, -0.058), (43, -0.044), (44, 0.035), (45, -0.001), (46, -0.005), (47, -0.041), (48, 0.028), (49, 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.87782562 176 hunch net-2006-05-01-A conversation between Theo and Pat

Introduction: Pat (the practitioner) I need to do multiclass classification and I only have a decision tree. Theo (the thoeretician) Use an error correcting output code . Pat Oh, that’s cool. But the created binary problems seem unintuitive. I’m not sure the decision tree can solve them. Theo Oh? Is your problem a decision list? Pat No, I don’t think so. Theo Hmm. Are the classes well separated by axis aligned splits? Pat Err, maybe. I’m not sure. Theo Well, if they are, under the IID assumption I can tell you how many samples you need. Pat IID? The data is definitely not IID. Theo Oh dear. Pat Can we get back to the choice of ECOC? I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other. Theo Ok. What do you know about your problem? Pat Not much. My friend just gave me the dataset. Theo Then, no one can help you. Pat (What a fuzzy thinker. Theo keeps jumping t

2 0.49846864 35 hunch net-2005-03-04-The Big O and Constants in Learning

Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera

3 0.47280401 218 hunch net-2006-11-20-Context and the calculation misperception

Introduction: This post is really for people not in machine learning (or related fields). It is about a common misperception which affects people who have not thought about the process of trying to predict somethinng. Hopefully, by precisely stating it, we can remove it. Suppose we have a set of events, each described by a vector of features. 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 1 1 0 Suppose we want to predict the value of the first feature given the others. One approach is to bin the data by one feature. For the above example, we might partition the data according to feature 2, then observe that when feature 2 is 0 the label (feature 1) is mostly 1. On the other hand, when feature 2 is 1, the label (feature 1) is mostly 0. Using this simple rule we get an observed error rate of 3/7. There are two issues here. The first is that this is really a training

4 0.45745373 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

5 0.4413484 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

6 0.43383777 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

7 0.43021646 14 hunch net-2005-02-07-The State of the Reduction

8 0.42731455 43 hunch net-2005-03-18-Binomial Weighting

9 0.4166452 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

10 0.41640061 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

11 0.41382757 28 hunch net-2005-02-25-Problem: Online Learning

12 0.41190434 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

13 0.40268308 487 hunch net-2013-07-24-ICML 2012 videos lost

14 0.40188646 289 hunch net-2008-02-17-The Meaning of Confidence

15 0.39762953 78 hunch net-2005-06-06-Exact Online Learning for Classification

16 0.39143372 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

17 0.39045861 476 hunch net-2012-12-29-Simons Institute Big Data Program

18 0.3875699 162 hunch net-2006-03-09-Use of Notation

19 0.38231534 332 hunch net-2008-12-23-Use of Learning Theory

20 0.38187465 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.403), (27, 0.144), (38, 0.063), (48, 0.017), (53, 0.027), (55, 0.044), (77, 0.017), (94, 0.039), (95, 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94466805 197 hunch net-2006-07-17-A Winner

Introduction: Ed Snelson won the Predictive Uncertainty in Environmental Modelling Competition in the temp(erature) category using this algorithm . Some characteristics of the algorithm are: Gradient descent … on about 600 parameters … with local minima … to solve regression. This bears a strong resemblance to a neural network. The two main differences seem to be: The system has a probabilistic interpretation (which may aid design). There are (perhaps) fewer parameters than a typical neural network might have for the same problem (aiding speed).

2 0.91070753 59 hunch net-2005-04-22-New Blog: [Lowerbounds,Upperbounds]

Introduction: Maverick Woo and the Aladdin group at CMU have started a CS theory-related blog here .

3 0.89446521 69 hunch net-2005-05-11-Visa Casualties

Introduction: For the Chicago 2005 machine learning summer school we are organizing, at least 5 international students can not come due to visa issues. There seem to be two aspects to visa issues: Inefficiency . The system rejected the student simply by being incapable of even starting to evaluate their visa in less than 1 month of time. Politics . Border controls became much tighter after the September 11 attack. Losing a big chunk of downtown of the largest city in a country will do that. What I (and the students) learned is that (1) is a much larger problem than (2). Only 1 prospective student seems to have achieved an explicit visa rejection. Fixing problem (1) should be a no-brainer, because the lag time almost surely indicates overload, and overload on border controls should worry even people concerned with (2). The obvious fixes to overload are “spend more money” and “make the system more efficient”. With respect to (2), (which is a more minor issue by the numbers) it i

same-blog 4 0.85724157 176 hunch net-2006-05-01-A conversation between Theo and Pat

Introduction: Pat (the practitioner) I need to do multiclass classification and I only have a decision tree. Theo (the thoeretician) Use an error correcting output code . Pat Oh, that’s cool. But the created binary problems seem unintuitive. I’m not sure the decision tree can solve them. Theo Oh? Is your problem a decision list? Pat No, I don’t think so. Theo Hmm. Are the classes well separated by axis aligned splits? Pat Err, maybe. I’m not sure. Theo Well, if they are, under the IID assumption I can tell you how many samples you need. Pat IID? The data is definitely not IID. Theo Oh dear. Pat Can we get back to the choice of ECOC? I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other. Theo Ok. What do you know about your problem? Pat Not much. My friend just gave me the dataset. Theo Then, no one can help you. Pat (What a fuzzy thinker. Theo keeps jumping t

5 0.71834689 414 hunch net-2010-10-17-Partha Niyogi has died

Introduction: from brain cancer. I asked Misha who worked with him to write about it. Partha Niyogi, Louis Block Professor in Computer Science and Statistics at the University of Chicago passed away on October 1, 2010, aged 43. I first met Partha Niyogi almost exactly ten years ago when I was a graduate student in math and he had just started as a faculty in Computer Science and Statistics at the University of Chicago. Strangely, we first talked at length due to a somewhat convoluted mathematical argument in a paper on pattern recognition. I asked him some questions about the paper, and, even though the topic was new to him, he had put serious thought into it and we started regular meetings. We made significant progress and developed a line of research stemming initially just from trying to understand that one paper and to simplify one derivation. I think this was typical of Partha, showing both his intellectual curiosity and his intuition for the serendipitous; having a sense and focus fo

6 0.64913392 447 hunch net-2011-10-10-ML Symposium and ICML details

7 0.57231534 19 hunch net-2005-02-14-Clever Methods of Overfitting

8 0.46584898 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

9 0.41387331 343 hunch net-2009-02-18-Decision by Vetocracy

10 0.404024 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

11 0.40396503 371 hunch net-2009-09-21-Netflix finishes (and starts)

12 0.40277344 380 hunch net-2009-11-29-AI Safety

13 0.4002015 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

14 0.39416236 12 hunch net-2005-02-03-Learning Theory, by assumption

15 0.39188427 360 hunch net-2009-06-15-In Active Learning, the question changes

16 0.39067066 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.3892979 36 hunch net-2005-03-05-Funding Research

18 0.38853869 194 hunch net-2006-07-11-New Models

19 0.38753694 259 hunch net-2007-08-19-Choice of Metrics

20 0.38712469 406 hunch net-2010-08-22-KDD 2010