hunch_net hunch_net-2005 hunch_net-2005-32 knowledge-graph by maker-knowledge-mining

32 hunch net-2005-02-27-Antilearning: When proximity goes bad

meta infos for this blog

Source: html

Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. [sent-1, score-0.266]

2 There is a pervasive intuition that “nearby things tend to have the same label”. [sent-2, score-0.752]

3 This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. [sent-3, score-0.977]

4 It turns out there are natural problems where this intuition is opposite of the truth. [sent-4, score-0.872]

5 One natural situation where this occurs is in competition. [sent-5, score-0.322]

6 For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? [sent-6, score-0.474]

7 This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”). [sent-8, score-2.077]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('intuition', 0.484), ('antilearning', 0.364), ('proximity', 0.324), ('amd', 0.3), ('instantiated', 0.15), ('opposite', 0.15), ('violation', 0.15), ('evidence', 0.145), ('exploit', 0.141), ('intel', 0.141), ('nearby', 0.135), ('pervasive', 0.13), ('meet', 0.125), ('natural', 0.122), ('thus', 0.121), ('svms', 0.118), ('intuitions', 0.118), ('adam', 0.112), ('occurs', 0.107), ('neighbor', 0.105), ('attempts', 0.103), ('fails', 0.103), ('badly', 0.101), ('mentioned', 0.101), ('nearest', 0.099), ('estimate', 0.096), ('tend', 0.096), ('trees', 0.094), ('situation', 0.093), ('classifiers', 0.091), ('predictive', 0.091), ('power', 0.09), ('neural', 0.084), ('turns', 0.081), ('label', 0.079), ('term', 0.078), ('classifier', 0.075), ('provide', 0.073), ('means', 0.058), ('decision', 0.055), ('interesting', 0.047), ('examples', 0.047), ('things', 0.042), ('number', 0.039), ('problems', 0.035), ('well', 0.031), ('example', 0.031), ('also', 0.029), ('one', 0.018)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 32 hunch net-2005-02-27-Antilearning: When proximity goes bad

2 0.10940823 95 hunch net-2005-07-14-What Learning Theory might do

Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a

3 0.089617945 128 hunch net-2005-11-05-The design of a computing cluster

Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,

4 0.063432254 26 hunch net-2005-02-21-Problem: Cross Validation

Introduction: The essential problem here is the large gap between experimental observation and theoretical understanding. Method K-fold cross validation is a commonly used technique which takes a set of m examples and partitions them into K sets (“folds”) of size m/K . For each fold, a classifier is trained on the other folds and then test on the fold. Problem Assume only independent samples. Derive a classifier from the K classifiers with a small bound on the true error rate. Past Work (I’ll add more as I remember/learn.) Devroye , Rogers, and Wagner analyzed cross validation and found algorithm specific bounds. Not all of this is online, but here is one paper . Michael Kearns and Dana Ron analyzed cross validation and found that under additional stability assumptions the bound for the classifier which learns on all the data is not much worse than for a test set of size m/K . Avrim Blum, Adam Kalai , and myself analyzed cross validation and found tha

5 0.063359007 481 hunch net-2013-04-15-NEML II

Introduction: Adam Kalai points out the New England Machine Learning Day May 1 at MSR New England. There is a poster session with abstracts due April 19. I understand last year’s NEML went well and it’s great to meet your neighbors at regional workshops like this.

6 0.062348638 16 hunch net-2005-02-09-Intuitions from applied learning

7 0.062125895 201 hunch net-2006-08-07-The Call of the Deep

8 0.060640078 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

9 0.05835887 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

10 0.057616215 43 hunch net-2005-03-18-Binomial Weighting

11 0.056258608 131 hunch net-2005-11-16-The Everything Ensemble Edge

12 0.055739667 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

13 0.053472646 324 hunch net-2008-11-09-A Healthy COLT

14 0.053005069 385 hunch net-2009-12-27-Interesting things at NIPS 2009

15 0.048360936 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

16 0.046657611 304 hunch net-2008-06-27-Reviewing Horror Stories

17 0.045728128 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.044526882 229 hunch net-2007-01-26-Parallel Machine Learning Problems

19 0.043567732 148 hunch net-2006-01-13-Benchmarks for RL

20 0.040893707 41 hunch net-2005-03-15-The State of Tight Bounds

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.08), (1, 0.04), (2, 0.005), (3, -0.006), (4, 0.022), (5, -0.012), (6, 0.004), (7, 0.022), (8, 0.027), (9, -0.02), (10, -0.033), (11, -0.006), (12, -0.031), (13, -0.062), (14, -0.001), (15, 0.037), (16, 0.013), (17, 0.006), (18, -0.017), (19, 0.028), (20, 0.015), (21, 0.002), (22, 0.026), (23, 0.011), (24, 0.026), (25, 0.033), (26, 0.022), (27, 0.065), (28, 0.009), (29, -0.016), (30, 0.02), (31, -0.013), (32, 0.016), (33, -0.013), (34, 0.057), (35, 0.076), (36, -0.03), (37, -0.01), (38, 0.026), (39, -0.041), (40, 0.05), (41, -0.075), (42, 0.005), (43, -0.057), (44, 0.0), (45, 0.077), (46, 0.066), (47, -0.02), (48, 0.031), (49, 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97255504 32 hunch net-2005-02-27-Antilearning: When proximity goes bad

2 0.51345003 131 hunch net-2005-11-16-The Everything Ensemble Edge

Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v

3 0.49091747 138 hunch net-2005-12-09-Some NIPS papers

Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo

4 0.48808232 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

5 0.48708025 43 hunch net-2005-03-18-Binomial Weighting

Introduction: Suppose we have a set of classifiers c making binary predictions from an input x and we see examples in an online fashion. In particular, we repeatedly see an unlabeled example x , make a prediction y’ (possibly based on the classifiers c ), and then see the correct label y . When one of these classifiers is perfect, there is a great algorithm available: predict according to the majority vote over every classifier consistent with every previous example. This is called the Halving algorithm. It makes at most log 2 |c| mistakes since on any mistake, at least half of the classifiers are eliminated. Obviously, we can’t generally hope that the there exists a classifier which never errs. The Binomial Weighting algorithm is an elegant technique allowing a variant Halving algorithm to cope with errors by creating a set of virtual classifiers for every classifier which occasionally disagree with the original classifier. The Halving algorithm on this set of virtual clas

6 0.47943535 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

7 0.45225495 26 hunch net-2005-02-21-Problem: Cross Validation

8 0.41798997 197 hunch net-2006-07-17-A Winner

9 0.41193435 111 hunch net-2005-09-12-Fast Gradient Descent

10 0.40050551 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

11 0.39481285 201 hunch net-2006-08-07-The Call of the Deep

12 0.38950327 287 hunch net-2008-01-28-Sufficient Computation

13 0.38745049 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

14 0.38286999 143 hunch net-2005-12-27-Automated Labeling

15 0.37764227 160 hunch net-2006-03-02-Why do people count for learning?

16 0.36970836 63 hunch net-2005-04-27-DARPA project: LAGR

17 0.3693226 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

18 0.36711463 120 hunch net-2005-10-10-Predictive Search is Coming

19 0.36710083 178 hunch net-2006-05-08-Big machine learning

20 0.36131838 171 hunch net-2006-04-09-Progress in Machine Translation

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.144), (53, 0.113), (55, 0.101), (90, 0.497)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99574065 340 hunch net-2009-01-28-Nielsen’s talk

Introduction: I wanted to point to Michael Nielsenâ€™s talk about blogging science, which I found interesting.

2 0.93496859 323 hunch net-2008-11-04-Rise of the Machines

Introduction: On the enduring topic of how people deal with intelligent machines , we have this important election bulletin .

same-blog 3 0.90693563 32 hunch net-2005-02-27-Antilearning: When proximity goes bad

4 0.76858008 139 hunch net-2005-12-11-More NIPS Papers

Introduction: Let me add to John’s post with a few of my own favourites from this year’s conference. First, let me say that Sanjoy’s talk, Coarse Sample Complexity Bounds for Active Learning was also one of my favourites, as was the Forgettron paper . I also really enjoyed the last third of Christos’ talk on the complexity of finding Nash equilibria. And, speaking of tagging, I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool. Finally, let me add my recommendations for specific papers: Z. Ghahramani, K. Heller: Bayesian Sets [no preprint] (A very elegant probabilistic information retrieval style model of which objects are “most like” a given subset of objects.) T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and the Indian Buffet Process [ preprint ] (A Dirichlet style prior over infinite binary matrices with beautiful exchangeability properties.) K. Weinberger, J. Blitzer, L. Saul: Distance Metric Lea

5 0.7604934 239 hunch net-2007-04-18-$50K Spock Challenge

Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.

6 0.74529272 144 hunch net-2005-12-28-Yet more nips thoughts

7 0.72368747 284 hunch net-2008-01-18-Datasets

8 0.59423423 333 hunch net-2008-12-27-Adversarial Academia

9 0.37377721 134 hunch net-2005-12-01-The Webscience Future

10 0.35512829 116 hunch net-2005-09-30-Research in conferences

11 0.35508084 151 hunch net-2006-01-25-1 year

12 0.35042891 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

13 0.35010657 201 hunch net-2006-08-07-The Call of the Deep

14 0.34757555 225 hunch net-2007-01-02-Retrospective

15 0.34702849 452 hunch net-2012-01-04-Why ICML? and the summer conferences

16 0.34662014 437 hunch net-2011-07-10-ICML 2011 and the future

17 0.34364298 21 hunch net-2005-02-17-Learning Research Programs

18 0.34331673 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

19 0.34277967 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

20 0.34042773 207 hunch net-2006-09-12-Incentive Compatible Reviewing