hunch_net hunch_net-2005 hunch_net-2005-6 knowledge-graph by maker-knowledge-mining

6 hunch net-2005-01-27-Learning Complete Problems

meta infos for this blog

Source: html

Introduction: Let’s define a learning problem as making predictions given past data. There are several ways to attack the learning problem which seem to be equivalent to solving the learning problem. Find the Invariant This viewpoint says that learning is all about learning (or incorporating) transformations of objects that do not change the correct prediction. The best possible invariant is the one which says “all things of the same class are the same”. Finding this is equivalent to learning. This viewpoint is particularly common when working with image features. Feature Selection This viewpoint says that the way to learn is by finding the right features to input to a learning algorithm. The best feature is the one which is the class to predict. Finding this is equivalent to learning for all reasonable learning algorithms. This viewpoint is common in several applications of machine learning. See Gilad’s and Bianca’s comments . Find the Representation This is almost the same a

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There are several ways to attack the learning problem which seem to be equivalent to solving the learning problem. [sent-2, score-0.466]

2 Find the Invariant This viewpoint says that learning is all about learning (or incorporating) transformations of objects that do not change the correct prediction. [sent-3, score-1.004]

3 The best possible invariant is the one which says “all things of the same class are the same”. [sent-4, score-0.852]

4 This viewpoint is particularly common when working with image features. [sent-6, score-0.534]

5 Feature Selection This viewpoint says that the way to learn is by finding the right features to input to a learning algorithm. [sent-7, score-1.39]

6 The best feature is the one which is the class to predict. [sent-8, score-0.614]

7 Finding this is equivalent to learning for all reasonable learning algorithms. [sent-9, score-0.392]

8 This viewpoint is common in several applications of machine learning. [sent-10, score-0.465]

9 Find the Representation This is almost the same as feature selection, except internal to the learning algorithm rather than external. [sent-12, score-0.428]

10 The key to learning is viewed as finding the best way to process the features in order to make predictions. [sent-13, score-1.199]

11 The best representation is the one which processes the features to produce the correct prediction. [sent-14, score-0.793]

12 This viewpoint is common for learning algorithm designers. [sent-15, score-0.687]

13 Find the Right Kernel The key to learning is finding the “right” kernel. [sent-16, score-0.551]

14 The optimal kernel is the one for which K(x, z)=1 when x and z have the same class and 0 otherwise. [sent-17, score-0.474]

15 With the right kernel, an SVM(or SVM-like optimization process) can solve any learning problem. [sent-18, score-0.391]

16 This viewpoint is common for people who work with SVMs. [sent-19, score-0.465]

17 Find the Right Metric The key to learning is finding the right metric. [sent-20, score-0.764]

18 The best metric is one which states that features with the same class label have distance 0 while features with different class labels have distance 1. [sent-21, score-1.794]

19 With the best metric, the nearest neighbor algorithm can solve any problem. [sent-22, score-0.499]

20 One consequence of this observation is that “wrapper methods” which try to automatically find a subset of features to feed into a learning algorithm in order to improve learning performance are simply trying to repair weaknesses in the learning algorithm. [sent-25, score-1.457]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('viewpoint', 0.343), ('finding', 0.285), ('features', 0.265), ('class', 0.231), ('right', 0.213), ('metric', 0.202), ('invariant', 0.193), ('find', 0.186), ('equivalent', 0.182), ('says', 0.179), ('best', 0.178), ('kernel', 0.172), ('key', 0.161), ('distance', 0.144), ('selection', 0.141), ('feature', 0.134), ('common', 0.122), ('algorithm', 0.117), ('correct', 0.109), ('representation', 0.107), ('learning', 0.105), ('bianca', 0.097), ('feed', 0.097), ('repair', 0.097), ('utility', 0.097), ('weaknesses', 0.091), ('transformations', 0.091), ('versions', 0.087), ('viewpoints', 0.083), ('order', 0.082), ('incorporating', 0.081), ('consequence', 0.078), ('attack', 0.074), ('solve', 0.073), ('internal', 0.072), ('svm', 0.072), ('objects', 0.072), ('one', 0.071), ('image', 0.069), ('neighbor', 0.067), ('realize', 0.067), ('subset', 0.066), ('nearest', 0.064), ('seems', 0.063), ('automatically', 0.063), ('processes', 0.063), ('states', 0.063), ('viewed', 0.063), ('define', 0.062), ('process', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 6 hunch net-2005-01-27-Learning Complete Problems

2 0.14755423 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

Introduction: Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU . This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous. This is not a beginning class—you really need to have taken a basic machine learning class previously to follow along. Students will be able to run and experiment with large scale learning algorithms since Yahoo! has donated servers which are being configured into a small scale Hadoop cluster. We are planning to cover the frontier of research in scalable learning algorithms, so good class projects could easily lead to papers. For me, this is a chance to teach on many topics of past research. In general, it seems like researchers should engage in at least occasional teaching of research, both as a proof of teachability and to see their own research through th

3 0.14751108 259 hunch net-2007-08-19-Choice of Metrics

Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor

4 0.13587786 333 hunch net-2008-12-27-Adversarial Academia

Introduction: One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants. This is not a viewpoint that I want to promote, as I consider it flawed. However, I know several people believe strongly in this viewpoint, and I have found it to have substantial explanatory power. For example: It explains why your paper was rejected based on poor logic. The reviewer wasn’t concerned with research quality, but rather with rejecting a competitor. It explains why professors rarely work together. The goal of a non-tenured professor (at least) is to get tenure, and a case for tenure comes from a portfolio of work that is undisputably yours. It explains why new research programs are not quickly adopted. Adopting a competitor’s program is impossible, if your career is based on the competitor being wrong. Different academic groups subscribe to the adversarial viewp

5 0.1246467 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

6 0.12146842 347 hunch net-2009-03-26-Machine Learning is too easy

7 0.1152714 258 hunch net-2007-08-12-Exponentiated Gradient

8 0.11037365 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

9 0.10662193 12 hunch net-2005-02-03-Learning Theory, by assumption

10 0.10638329 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

11 0.1059165 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

12 0.10565121 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

13 0.1041621 332 hunch net-2008-12-23-Use of Learning Theory

14 0.10368757 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

15 0.10350062 351 hunch net-2009-05-02-Wielding a New Abstraction

16 0.10111725 309 hunch net-2008-07-10-Interesting papers, ICML 2008

17 0.098164521 139 hunch net-2005-12-11-More NIPS Papers

18 0.09674041 39 hunch net-2005-03-10-Breaking Abstractions

19 0.096297741 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

20 0.094826274 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.22), (1, 0.107), (2, -0.004), (3, 0.016), (4, 0.037), (5, -0.03), (6, -0.029), (7, 0.045), (8, 0.049), (9, -0.023), (10, -0.069), (11, -0.022), (12, 0.034), (13, -0.031), (14, 0.051), (15, 0.047), (16, -0.02), (17, -0.033), (18, -0.059), (19, 0.057), (20, 0.048), (21, 0.009), (22, -0.005), (23, -0.079), (24, 0.07), (25, -0.036), (26, -0.006), (27, -0.012), (28, -0.098), (29, 0.012), (30, -0.065), (31, -0.095), (32, -0.006), (33, -0.02), (34, -0.127), (35, 0.043), (36, 0.078), (37, 0.101), (38, -0.026), (39, 0.026), (40, -0.038), (41, 0.01), (42, 0.093), (43, -0.056), (44, -0.005), (45, -0.051), (46, -0.056), (47, -0.074), (48, 0.078), (49, 0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93311232 6 hunch net-2005-01-27-Learning Complete Problems

2 0.70137233 253 hunch net-2007-07-06-Idempotent-capable Predictors

Introduction: One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons: Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O. “Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system. When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors. Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.

3 0.67053747 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and

4 0.66378587 276 hunch net-2007-12-10-Learning Track of International Planning Competition

Introduction: The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more information you can go to the competition web-site . The competitions are typically organized around a number of planning domains that can vary from year to year, where a planning domain is simply a class of problems that share a common action schema—e.g. Blocksworld is a well-known planning domain that contains a problem instance each possible initial tower configuration and goal configuration. Some other domains have included Logistics, Airport, Freecell, PipesWorld, and many others . For each domain the competition includes a number of problems (say 40-50) and the planners are run on each problem with a time limit for each problem (around 30 minutes). The problems are hard enough that many problems are not solved within th

5 0.64706516 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

6 0.62353265 348 hunch net-2009-04-02-Asymmophobia

7 0.60507727 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

8 0.59603447 33 hunch net-2005-02-28-Regularization

9 0.58977002 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

10 0.58828413 138 hunch net-2005-12-09-Some NIPS papers

11 0.58484131 347 hunch net-2009-03-26-Machine Learning is too easy

12 0.58107173 351 hunch net-2009-05-02-Wielding a New Abstraction

13 0.57989585 185 hunch net-2006-06-16-Regularization = Robustness

14 0.56154418 104 hunch net-2005-08-22-Do you believe in induction?

15 0.56116557 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

16 0.55973798 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

17 0.55884951 308 hunch net-2008-07-06-To Dual or Not

18 0.55519331 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

19 0.54513651 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

20 0.54279739 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.216), (53, 0.61), (55, 0.033), (94, 0.023)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99896675 16 hunch net-2005-02-09-Intuitions from applied learning

Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha

2 0.99633396 56 hunch net-2005-04-14-Families of Learning Theory Statements

Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?

3 0.99296039 107 hunch net-2005-09-05-Site Update

Introduction: I tweaked the site in a number of ways today, including: Updating to WordPress 1.5. Installing and heavily tweaking the Geekniche theme. Update: I switched back to a tweaked version of the old theme. Adding the Customizable Post Listings plugin. Installing the StatTraq plugin. Updating some of the links. I particularly recommend looking at the computer research policy blog. Adding threaded comments . This doesn’t thread old comments obviously, but the extra structure may be helpful for new ones. Overall, I think this is an improvement, and it addresses a few of my earlier problems . If you have any difficulties or anything seems “not quite right”, please speak up. A few other tweaks to the site may happen in the near future.

4 0.98900872 2 hunch net-2005-01-24-Holy grails of machine learning?

Introduction: Let me kick things off by posing this question to ML researchers: What do you think are some important holy grails of machine learning? For example: – “A classifier with SVM-level performance but much more scalable” – “Practical confidence bounds (or learning bounds) for classification” – “A reinforcement learning algorithm that can handle the ___ problem” – “Understanding theoretically why ___ works so well in practice” etc. I pose this question because I believe that when goals are stated explicitly and well (thus providing clarity as well as opening up the problems to more people), rather than left implicit, they are likely to be achieved much more quickly. I would also like to know more about the internal goals of the various machine learning sub-areas (theory, kernel methods, graphical models, reinforcement learning, etc) as stated by people in these respective areas. This could help people cross sub-areas.

5 0.98804748 91 hunch net-2005-07-10-Thinking the Unthought

Introduction: One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts. Try to rephrase the problem so the a

6 0.97703701 367 hunch net-2009-08-16-Centmail comments

same-blog 7 0.94792593 6 hunch net-2005-01-27-Learning Complete Problems

8 0.91419923 145 hunch net-2005-12-29-Deadline Season

9 0.80214524 21 hunch net-2005-02-17-Learning Research Programs

10 0.72446162 201 hunch net-2006-08-07-The Call of the Deep

11 0.7074852 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

12 0.6998986 151 hunch net-2006-01-25-1 year

13 0.69439036 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

14 0.68334216 141 hunch net-2005-12-17-Workshops as Franchise Conferences

15 0.67390227 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

16 0.65444291 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

17 0.64720803 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

18 0.63677979 25 hunch net-2005-02-20-At One Month

19 0.63127917 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

20 0.62886274 416 hunch net-2010-10-29-To Vidoelecture or not