hunch_net hunch_net-2006 hunch_net-2006-161 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
sentIndex sentText sentNum sentScore
1 Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. [sent-1, score-0.133]
2 Structural learning is multitask learning on subproblems created from unlabeled data. [sent-2, score-0.931]
3 The basic idea is to take a look at the unlabeled data and create many supervised problems. [sent-3, score-0.618]
4 On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. [sent-4, score-1.158]
5 The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. [sent-5, score-1.008]
6 In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. [sent-6, score-1.051]
7 What seems critical is that the hope is realized: the technique provides a significant edge in practice. [sent-7, score-0.49]
8 Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? [sent-8, score-0.528]
9 Is it necessary to use a shared representation? [sent-9, score-0.213]
wordName wordTfidf (topN-words)
[('subproblems', 0.499), ('structural', 0.3), ('unlabeled', 0.21), ('precise', 0.206), ('ando', 0.172), ('pereira', 0.159), ('middle', 0.159), ('formed', 0.15), ('tied', 0.15), ('surrounding', 0.15), ('fernando', 0.143), ('successfully', 0.143), ('realized', 0.137), ('shared', 0.137), ('pointed', 0.133), ('multitask', 0.129), ('zhang', 0.125), ('hope', 0.124), ('matrix', 0.119), ('text', 0.116), ('edge', 0.111), ('word', 0.103), ('automated', 0.102), ('mechanisms', 0.1), ('words', 0.098), ('parameters', 0.097), ('predictors', 0.096), ('created', 0.093), ('technique', 0.092), ('basic', 0.09), ('critical', 0.089), ('representation', 0.088), ('data', 0.088), ('supervised', 0.086), ('predicting', 0.085), ('creating', 0.083), ('core', 0.082), ('effective', 0.078), ('look', 0.077), ('necessary', 0.076), ('questions', 0.075), ('relevant', 0.075), ('provides', 0.074), ('linear', 0.073), ('mechanism', 0.073), ('run', 0.072), ('test', 0.072), ('create', 0.067), ('essentially', 0.065), ('predict', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
2 0.17674285 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
3 0.15078394 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to
4 0.14000134 68 hunch net-2005-05-10-Learning Reductions are Reductionist
Introduction: This is about a fundamental motivation for the investigation of reductions in learning. It applies to many pieces of work other than my own. The reductionist approach to problem solving is characterized by taking a problem, decomposing it into as-small-as-possible subproblems, discovering how to solve the subproblems, and then discovering how to use the solutions to the subproblems to solve larger problems. The reductionist approach to solving problems has often payed off very well. Computer science related examples of the reductionist approach include: Reducing computation to the transistor. All of our CPUs are built from transistors. Reducing rendering of images to rendering a triangle (or other simple polygons). Computers can now render near-realistic scenes in real time. The big breakthrough came from learning how to render many triangles quickly. This approach to problem solving extends well beyond computer science. Many fields of science focus on theories mak
5 0.11693567 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask
6 0.10345018 299 hunch net-2008-04-27-Watchword: Supervised Learning
7 0.098960645 370 hunch net-2009-09-18-Necessary and Sufficient Research
8 0.09410885 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
9 0.088250399 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
10 0.08761254 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
11 0.08658874 112 hunch net-2005-09-14-The Predictionist Viewpoint
12 0.085721396 97 hunch net-2005-07-23-Interesting papers at ACL
13 0.081866793 56 hunch net-2005-04-14-Families of Learning Theory Statements
14 0.08007884 229 hunch net-2007-01-26-Parallel Machine Learning Problems
15 0.078289203 330 hunch net-2008-12-07-A NIPS paper
16 0.074566498 127 hunch net-2005-11-02-Progress in Active Learning
17 0.073250353 347 hunch net-2009-03-26-Machine Learning is too easy
18 0.073092744 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
19 0.071851619 465 hunch net-2012-05-12-ICML accepted papers and early registration
20 0.068889908 360 hunch net-2009-06-15-In Active Learning, the question changes
topicId topicWeight
[(0, 0.147), (1, 0.071), (2, -0.024), (3, 0.01), (4, 0.069), (5, -0.022), (6, -0.011), (7, 0.031), (8, 0.035), (9, -0.016), (10, -0.04), (11, -0.021), (12, -0.047), (13, 0.089), (14, -0.095), (15, -0.014), (16, -0.002), (17, 0.027), (18, -0.071), (19, -0.016), (20, 0.03), (21, -0.031), (22, 0.045), (23, 0.019), (24, -0.003), (25, -0.082), (26, 0.071), (27, -0.005), (28, 0.004), (29, -0.067), (30, 0.059), (31, 0.08), (32, 0.047), (33, -0.054), (34, -0.102), (35, 0.004), (36, -0.038), (37, 0.123), (38, -0.106), (39, -0.064), (40, 0.069), (41, -0.033), (42, 0.018), (43, 0.111), (44, 0.059), (45, -0.003), (46, -0.048), (47, -0.033), (48, 0.009), (49, -0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.97019762 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
2 0.71588188 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
3 0.69239807 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to
4 0.61571813 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask
5 0.54730785 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an
6 0.53809983 68 hunch net-2005-05-10-Learning Reductions are Reductionist
7 0.53533673 90 hunch net-2005-07-07-The Limits of Learning Theory
8 0.5148195 168 hunch net-2006-04-02-Mad (Neuro)science
9 0.51143187 348 hunch net-2009-04-02-Asymmophobia
10 0.50129104 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
11 0.50001079 94 hunch net-2005-07-13-Text Entailment at AAAI
12 0.49634969 360 hunch net-2009-06-15-In Active Learning, the question changes
13 0.49593589 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
14 0.49585047 299 hunch net-2008-04-27-Watchword: Supervised Learning
15 0.49546573 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
16 0.49430653 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
17 0.49107951 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
18 0.48364577 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
19 0.48302695 45 hunch net-2005-03-22-Active learning
20 0.4744164 114 hunch net-2005-09-20-Workshop Proposal: Atomic Learning
topicId topicWeight
[(27, 0.181), (53, 0.047), (55, 0.074), (71, 0.462), (94, 0.093), (95, 0.022)]
simIndex simValue blogId blogTitle
1 0.93254572 417 hunch net-2010-11-18-ICML 2011 – Call for Tutorials
Introduction: I would like to encourage people to consider giving a tutorial at next years ICML. The ideal tutorial attracts a wide audience, provides a gentle and easily taught introduction to the chosen research area, and also covers the most important contributions in depth. Submissions are due January 14 Â (about two weeks before paper deadline). http://www.icml-2011.org/tutorials.php Regards, Ulf
same-blog 2 0.86253309 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
3 0.74510157 147 hunch net-2006-01-08-Debugging Your Brain
Introduction: One part of doing research is debugging your understanding of reality. This is hard work: How do you even discover where you misunderstand? If you discover a misunderstanding, how do you go about removing it? The process of debugging computer programs is quite analogous to debugging reality misunderstandings. This is natural—a bug in a computer program is a misunderstanding between you and the computer about what you said. Many of the familiar techniques from debugging have exact parallels. Details When programming, there are often signs that some bug exists like: “the graph my program output is shifted a little bit” = maybe you have an indexing error. In debugging yourself, we often have some impression that something is “not right”. These impressions should be addressed directly and immediately. (Some people have the habit of suppressing worries in favor of excess certainty. That’s not healthy for research.) Corner Cases A “corner case” is an input to a program wh
4 0.63141435 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
5 0.62527239 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and
6 0.48381728 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)
7 0.47013995 229 hunch net-2007-01-26-Parallel Machine Learning Problems
8 0.45265862 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
9 0.4271687 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
10 0.42239353 95 hunch net-2005-07-14-What Learning Theory might do
11 0.42220521 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
12 0.42014524 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
13 0.42008153 371 hunch net-2009-09-21-Netflix finishes (and starts)
14 0.41977686 351 hunch net-2009-05-02-Wielding a New Abstraction
15 0.41899714 347 hunch net-2009-03-26-Machine Learning is too easy
16 0.41884354 96 hunch net-2005-07-21-Six Months
17 0.41842228 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
18 0.41789639 360 hunch net-2009-06-15-In Active Learning, the question changes
19 0.4176169 237 hunch net-2007-04-02-Contextual Scaling
20 0.4175981 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models