hunch_net hunch_net-2006 hunch_net-2006-161 knowledge-graph by maker-knowledge-mining

161 hunch net-2006-03-05-“Structural” Learning


meta infos for this blog

Source: html

Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. [sent-1, score-0.133]

2 Structural learning is multitask learning on subproblems created from unlabeled data. [sent-2, score-0.931]

3 The basic idea is to take a look at the unlabeled data and create many supervised problems. [sent-3, score-0.618]

4 On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. [sent-4, score-1.158]

5 The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. [sent-5, score-1.008]

6 In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. [sent-6, score-1.051]

7 What seems critical is that the hope is realized: the technique provides a significant edge in practice. [sent-7, score-0.49]

8 Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? [sent-8, score-0.528]

9 Is it necessary to use a shared representation? [sent-9, score-0.213]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('subproblems', 0.499), ('structural', 0.3), ('unlabeled', 0.21), ('precise', 0.206), ('ando', 0.172), ('pereira', 0.159), ('middle', 0.159), ('formed', 0.15), ('tied', 0.15), ('surrounding', 0.15), ('fernando', 0.143), ('successfully', 0.143), ('realized', 0.137), ('shared', 0.137), ('pointed', 0.133), ('multitask', 0.129), ('zhang', 0.125), ('hope', 0.124), ('matrix', 0.119), ('text', 0.116), ('edge', 0.111), ('word', 0.103), ('automated', 0.102), ('mechanisms', 0.1), ('words', 0.098), ('parameters', 0.097), ('predictors', 0.096), ('created', 0.093), ('technique', 0.092), ('basic', 0.09), ('critical', 0.089), ('representation', 0.088), ('data', 0.088), ('supervised', 0.086), ('predicting', 0.085), ('creating', 0.083), ('core', 0.082), ('effective', 0.078), ('look', 0.077), ('necessary', 0.076), ('questions', 0.075), ('relevant', 0.075), ('provides', 0.074), ('linear', 0.073), ('mechanism', 0.073), ('run', 0.072), ('test', 0.072), ('create', 0.067), ('essentially', 0.065), ('predict', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 161 hunch net-2006-03-05-“Structural” Learning

Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?

2 0.17674285 143 hunch net-2005-12-27-Automated Labeling

Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict

3 0.15078394 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to

4 0.14000134 68 hunch net-2005-05-10-Learning Reductions are Reductionist

Introduction: This is about a fundamental motivation for the investigation of reductions in learning. It applies to many pieces of work other than my own. The reductionist approach to problem solving is characterized by taking a problem, decomposing it into as-small-as-possible subproblems, discovering how to solve the subproblems, and then discovering how to use the solutions to the subproblems to solve larger problems. The reductionist approach to solving problems has often payed off very well. Computer science related examples of the reductionist approach include: Reducing computation to the transistor. All of our CPUs are built from transistors. Reducing rendering of images to rendering a triangle (or other simple polygons). Computers can now render near-realistic scenes in real time. The big breakthrough came from learning how to render many triangles quickly. This approach to problem solving extends well beyond computer science. Many fields of science focus on theories mak

5 0.11693567 164 hunch net-2006-03-17-Multitask learning is Black-Boxable

Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask

6 0.10345018 299 hunch net-2008-04-27-Watchword: Supervised Learning

7 0.098960645 370 hunch net-2009-09-18-Necessary and Sufficient Research

8 0.09410885 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

9 0.088250399 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

10 0.08761254 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006

11 0.08658874 112 hunch net-2005-09-14-The Predictionist Viewpoint

12 0.085721396 97 hunch net-2005-07-23-Interesting papers at ACL

13 0.081866793 56 hunch net-2005-04-14-Families of Learning Theory Statements

14 0.08007884 229 hunch net-2007-01-26-Parallel Machine Learning Problems

15 0.078289203 330 hunch net-2008-12-07-A NIPS paper

16 0.074566498 127 hunch net-2005-11-02-Progress in Active Learning

17 0.073250353 347 hunch net-2009-03-26-Machine Learning is too easy

18 0.073092744 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

19 0.071851619 465 hunch net-2012-05-12-ICML accepted papers and early registration

20 0.068889908 360 hunch net-2009-06-15-In Active Learning, the question changes


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.147), (1, 0.071), (2, -0.024), (3, 0.01), (4, 0.069), (5, -0.022), (6, -0.011), (7, 0.031), (8, 0.035), (9, -0.016), (10, -0.04), (11, -0.021), (12, -0.047), (13, 0.089), (14, -0.095), (15, -0.014), (16, -0.002), (17, 0.027), (18, -0.071), (19, -0.016), (20, 0.03), (21, -0.031), (22, 0.045), (23, 0.019), (24, -0.003), (25, -0.082), (26, 0.071), (27, -0.005), (28, 0.004), (29, -0.067), (30, 0.059), (31, 0.08), (32, 0.047), (33, -0.054), (34, -0.102), (35, 0.004), (36, -0.038), (37, 0.123), (38, -0.106), (39, -0.064), (40, 0.069), (41, -0.033), (42, 0.018), (43, 0.111), (44, 0.059), (45, -0.003), (46, -0.048), (47, -0.033), (48, 0.009), (49, -0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97019762 161 hunch net-2006-03-05-“Structural” Learning

Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?

2 0.71588188 143 hunch net-2005-12-27-Automated Labeling

Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict

3 0.69239807 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to

4 0.61571813 164 hunch net-2006-03-17-Multitask learning is Black-Boxable

Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask

5 0.54730785 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an

6 0.53809983 68 hunch net-2005-05-10-Learning Reductions are Reductionist

7 0.53533673 90 hunch net-2005-07-07-The Limits of Learning Theory

8 0.5148195 168 hunch net-2006-04-02-Mad (Neuro)science

9 0.51143187 348 hunch net-2009-04-02-Asymmophobia

10 0.50129104 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

11 0.50001079 94 hunch net-2005-07-13-Text Entailment at AAAI

12 0.49634969 360 hunch net-2009-06-15-In Active Learning, the question changes

13 0.49593589 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

14 0.49585047 299 hunch net-2008-04-27-Watchword: Supervised Learning

15 0.49546573 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

16 0.49430653 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006

17 0.49107951 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

18 0.48364577 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

19 0.48302695 45 hunch net-2005-03-22-Active learning

20 0.4744164 114 hunch net-2005-09-20-Workshop Proposal: Atomic Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.181), (53, 0.047), (55, 0.074), (71, 0.462), (94, 0.093), (95, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93254572 417 hunch net-2010-11-18-ICML 2011 – Call for Tutorials

Introduction: I would like to encourage people to consider giving a tutorial at next years ICML. The ideal tutorial attracts a wide audience, provides a gentle and easily taught introduction to the chosen research area, and also covers the most important contributions in depth. Submissions are due January 14  (about two weeks before paper deadline). http://www.icml-2011.org/tutorials.php Regards, Ulf

same-blog 2 0.86253309 161 hunch net-2006-03-05-“Structural” Learning

Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?

3 0.74510157 147 hunch net-2006-01-08-Debugging Your Brain

Introduction: One part of doing research is debugging your understanding of reality. This is hard work: How do you even discover where you misunderstand? If you discover a misunderstanding, how do you go about removing it? The process of debugging computer programs is quite analogous to debugging reality misunderstandings. This is natural—a bug in a computer program is a misunderstanding between you and the computer about what you said. Many of the familiar techniques from debugging have exact parallels. Details When programming, there are often signs that some bug exists like: “the graph my program output is shifted a little bit” = maybe you have an indexing error. In debugging yourself, we often have some impression that something is “not right”. These impressions should be addressed directly and immediately. (Some people have the habit of suppressing worries in favor of excess certainty. That’s not healthy for research.) Corner Cases A “corner case” is an input to a program wh

4 0.63141435 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne

5 0.62527239 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and

6 0.48381728 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)

7 0.47013995 229 hunch net-2007-01-26-Parallel Machine Learning Problems

8 0.45265862 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

9 0.4271687 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

10 0.42239353 95 hunch net-2005-07-14-What Learning Theory might do

11 0.42220521 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

12 0.42014524 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

13 0.42008153 371 hunch net-2009-09-21-Netflix finishes (and starts)

14 0.41977686 351 hunch net-2009-05-02-Wielding a New Abstraction

15 0.41899714 347 hunch net-2009-03-26-Machine Learning is too easy

16 0.41884354 96 hunch net-2005-07-21-Six Months

17 0.41842228 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

18 0.41789639 360 hunch net-2009-06-15-In Active Learning, the question changes

19 0.4176169 237 hunch net-2007-04-02-Contextual Scaling

20 0.4175981 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models