hunch_net hunch_net-2005 hunch_net-2005-32 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
sentIndex sentText sentNum sentScore
1 Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. [sent-1, score-0.266]
2 There is a pervasive intuition that “nearby things tend to have the same label”. [sent-2, score-0.752]
3 This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. [sent-3, score-0.977]
4 It turns out there are natural problems where this intuition is opposite of the truth. [sent-4, score-0.872]
5 One natural situation where this occurs is in competition. [sent-5, score-0.322]
6 For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? [sent-6, score-0.474]
7 This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”). [sent-8, score-2.077]
wordName wordTfidf (topN-words)
[('intuition', 0.484), ('antilearning', 0.364), ('proximity', 0.324), ('amd', 0.3), ('instantiated', 0.15), ('opposite', 0.15), ('violation', 0.15), ('evidence', 0.145), ('exploit', 0.141), ('intel', 0.141), ('nearby', 0.135), ('pervasive', 0.13), ('meet', 0.125), ('natural', 0.122), ('thus', 0.121), ('svms', 0.118), ('intuitions', 0.118), ('adam', 0.112), ('occurs', 0.107), ('neighbor', 0.105), ('attempts', 0.103), ('fails', 0.103), ('badly', 0.101), ('mentioned', 0.101), ('nearest', 0.099), ('estimate', 0.096), ('tend', 0.096), ('trees', 0.094), ('situation', 0.093), ('classifiers', 0.091), ('predictive', 0.091), ('power', 0.09), ('neural', 0.084), ('turns', 0.081), ('label', 0.079), ('term', 0.078), ('classifier', 0.075), ('provide', 0.073), ('means', 0.058), ('decision', 0.055), ('interesting', 0.047), ('examples', 0.047), ('things', 0.042), ('number', 0.039), ('problems', 0.035), ('well', 0.031), ('example', 0.031), ('also', 0.029), ('one', 0.018)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
2 0.10940823 95 hunch net-2005-07-14-What Learning Theory might do
Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a
3 0.089617945 128 hunch net-2005-11-05-The design of a computing cluster
Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,
4 0.063432254 26 hunch net-2005-02-21-Problem: Cross Validation
Introduction: The essential problem here is the large gap between experimental observation and theoretical understanding. Method K-fold cross validation is a commonly used technique which takes a set of m examples and partitions them into K sets (“folds”) of size m/K . For each fold, a classifier is trained on the other folds and then test on the fold. Problem Assume only independent samples. Derive a classifier from the K classifiers with a small bound on the true error rate. Past Work (I’ll add more as I remember/learn.) Devroye , Rogers, and Wagner analyzed cross validation and found algorithm specific bounds. Not all of this is online, but here is one paper . Michael Kearns and Dana Ron analyzed cross validation and found that under additional stability assumptions the bound for the classifier which learns on all the data is not much worse than for a test set of size m/K . Avrim Blum, Adam Kalai , and myself analyzed cross validation and found tha
5 0.063359007 481 hunch net-2013-04-15-NEML II
Introduction: Adam Kalai points out the New England Machine Learning Day May 1 at MSR New England. There is a poster session with abstracts due April 19. I understand last year’s NEML went well and it’s great to meet your neighbors at regional workshops like this.
6 0.062348638 16 hunch net-2005-02-09-Intuitions from applied learning
7 0.062125895 201 hunch net-2006-08-07-The Call of the Deep
8 0.060640078 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
9 0.05835887 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
10 0.057616215 43 hunch net-2005-03-18-Binomial Weighting
11 0.056258608 131 hunch net-2005-11-16-The Everything Ensemble Edge
12 0.055739667 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
13 0.053472646 324 hunch net-2008-11-09-A Healthy COLT
14 0.053005069 385 hunch net-2009-12-27-Interesting things at NIPS 2009
15 0.048360936 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous
16 0.046657611 304 hunch net-2008-06-27-Reviewing Horror Stories
17 0.045728128 235 hunch net-2007-03-03-All Models of Learning have Flaws
18 0.044526882 229 hunch net-2007-01-26-Parallel Machine Learning Problems
19 0.043567732 148 hunch net-2006-01-13-Benchmarks for RL
20 0.040893707 41 hunch net-2005-03-15-The State of Tight Bounds
topicId topicWeight
[(0, 0.08), (1, 0.04), (2, 0.005), (3, -0.006), (4, 0.022), (5, -0.012), (6, 0.004), (7, 0.022), (8, 0.027), (9, -0.02), (10, -0.033), (11, -0.006), (12, -0.031), (13, -0.062), (14, -0.001), (15, 0.037), (16, 0.013), (17, 0.006), (18, -0.017), (19, 0.028), (20, 0.015), (21, 0.002), (22, 0.026), (23, 0.011), (24, 0.026), (25, 0.033), (26, 0.022), (27, 0.065), (28, 0.009), (29, -0.016), (30, 0.02), (31, -0.013), (32, 0.016), (33, -0.013), (34, 0.057), (35, 0.076), (36, -0.03), (37, -0.01), (38, 0.026), (39, -0.041), (40, 0.05), (41, -0.075), (42, 0.005), (43, -0.057), (44, 0.0), (45, 0.077), (46, 0.066), (47, -0.02), (48, 0.031), (49, 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 0.97255504 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
2 0.51345003 131 hunch net-2005-11-16-The Everything Ensemble Edge
Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v
3 0.49091747 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
4 0.48808232 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what
5 0.48708025 43 hunch net-2005-03-18-Binomial Weighting
Introduction: Suppose we have a set of classifiers c making binary predictions from an input x and we see examples in an online fashion. In particular, we repeatedly see an unlabeled example x , make a prediction y’ (possibly based on the classifiers c ), and then see the correct label y . When one of these classifiers is perfect, there is a great algorithm available: predict according to the majority vote over every classifier consistent with every previous example. This is called the Halving algorithm. It makes at most log 2 |c| mistakes since on any mistake, at least half of the classifiers are eliminated. Obviously, we can’t generally hope that the there exists a classifier which never errs. The Binomial Weighting algorithm is an elegant technique allowing a variant Halving algorithm to cope with errors by creating a set of virtual classifiers for every classifier which occasionally disagree with the original classifier. The Halving algorithm on this set of virtual clas
6 0.47943535 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
7 0.45225495 26 hunch net-2005-02-21-Problem: Cross Validation
8 0.41798997 197 hunch net-2006-07-17-A Winner
9 0.41193435 111 hunch net-2005-09-12-Fast Gradient Descent
10 0.40050551 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC
11 0.39481285 201 hunch net-2006-08-07-The Call of the Deep
12 0.38950327 287 hunch net-2008-01-28-Sufficient Computation
13 0.38745049 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
14 0.38286999 143 hunch net-2005-12-27-Automated Labeling
15 0.37764227 160 hunch net-2006-03-02-Why do people count for learning?
16 0.36970836 63 hunch net-2005-04-27-DARPA project: LAGR
17 0.3693226 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
18 0.36711463 120 hunch net-2005-10-10-Predictive Search is Coming
19 0.36710083 178 hunch net-2006-05-08-Big machine learning
20 0.36131838 171 hunch net-2006-04-09-Progress in Machine Translation
topicId topicWeight
[(27, 0.144), (53, 0.113), (55, 0.101), (90, 0.497)]
simIndex simValue blogId blogTitle
1 0.99574065 340 hunch net-2009-01-28-Nielsen’s talk
Introduction: I wanted to point to Michael Nielsen’s talk about blogging science, which I found interesting.
2 0.93496859 323 hunch net-2008-11-04-Rise of the Machines
Introduction: On the enduring topic of how people deal with intelligent machines , we have this important election bulletin .
same-blog 3 0.90693563 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
4 0.76858008 139 hunch net-2005-12-11-More NIPS Papers
Introduction: Let me add to John’s post with a few of my own favourites from this year’s conference. First, let me say that Sanjoy’s talk, Coarse Sample Complexity Bounds for Active Learning was also one of my favourites, as was the Forgettron paper . I also really enjoyed the last third of Christos’ talk on the complexity of finding Nash equilibria. And, speaking of tagging, I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool. Finally, let me add my recommendations for specific papers: Z. Ghahramani, K. Heller: Bayesian Sets [no preprint] (A very elegant probabilistic information retrieval style model of which objects are “most like” a given subset of objects.) T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and the Indian Buffet Process [ preprint ] (A Dirichlet style prior over infinite binary matrices with beautiful exchangeability properties.) K. Weinberger, J. Blitzer, L. Saul: Distance Metric Lea
5 0.7604934 239 hunch net-2007-04-18-$50K Spock Challenge
Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
6 0.74529272 144 hunch net-2005-12-28-Yet more nips thoughts
7 0.72368747 284 hunch net-2008-01-18-Datasets
8 0.59423423 333 hunch net-2008-12-27-Adversarial Academia
9 0.37377721 134 hunch net-2005-12-01-The Webscience Future
10 0.35512829 116 hunch net-2005-09-30-Research in conferences
11 0.35508084 151 hunch net-2006-01-25-1 year
12 0.35042891 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions
13 0.35010657 201 hunch net-2006-08-07-The Call of the Deep
14 0.34757555 225 hunch net-2007-01-02-Retrospective
15 0.34702849 452 hunch net-2012-01-04-Why ICML? and the summer conferences
16 0.34662014 437 hunch net-2011-07-10-ICML 2011 and the future
17 0.34364298 21 hunch net-2005-02-17-Learning Research Programs
18 0.34331673 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
19 0.34277967 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
20 0.34042773 207 hunch net-2006-09-12-Incentive Compatible Reviewing