hunch_net hunch_net-2006 hunch_net-2006-149 knowledge-graph by maker-knowledge-mining

149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

meta infos for this blog

Source: html

Introduction: Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. Another way to ask this question is: can we avoid sharing the internal representation? For example, it might be feasible to solve multitask learning by some process feeding the i th prediction f(x) i into the j th predictor f(x,f(x) i ) j , If the answer is “no”, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems. If the answer is “yes”, then we can reuse binary classification algorithms to

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Multitask learning is the learning to predict multiple outputs given the same input. [sent-1, score-0.238]

2 Mathematically, we might think of this as trying to learn a function f:X -> {0,1} n . [sent-2, score-0.145]

3 Structured learning is similar at this level of abstraction. [sent-3, score-0.157]

4 Many people have worked on solving multitask learning (for example Rich Caruana ) using methods which share an internal representation. [sent-4, score-0.742]

5 On other words, the the computation and learning of the i th prediction is shared with the computation and learning of the j th prediction. [sent-5, score-1.575]

6 Another way to ask this question is: can we avoid sharing the internal representation? [sent-6, score-0.32]

7 If the answer is “yes”, then we can reuse binary classification algorithms to solve multitask learning problems. [sent-8, score-0.98]

8 Finding a satisfying answer to this question at a theoretical level appears tricky. [sent-9, score-0.438]

9 If you consider the infinite data limit with IID samples for any finite X , the answer is “yes” because any function can be learned. [sent-10, score-0.409]

10 However, this does not take into account two important effects: Using a shared representation alters the bias of the learning process. [sent-11, score-0.986]

11 What this implies is that fewer examples may be required to learn all of the predictors. [sent-12, score-0.208]

12 Of course, changing the input features also alters the bias of the learning process. [sent-13, score-0.449]

13 Comparing these altered biases well enough to distinguish their power seems very tricky. [sent-14, score-0.231]

14 For reference, Jonathon Baxter has done some related analysis (which still doesn’t answer the question). [sent-15, score-0.255]

15 Using a shared representation may be computationally cheaper. [sent-16, score-0.534]

16 One thing which can be said about multitask learning (in either black-box or shared representation form), is that it can make learning radically easier. [sent-17, score-1.194]

17 For example, predicting the first bit output by a cryptographic circuit is (by design) extraordinarily hard. [sent-18, score-0.484]

18 However, predicting the bits of every gate in the circuit (including the first bit output) is easily done based upon a few examples. [sent-19, score-0.373]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('th', 0.406), ('multitask', 0.38), ('shared', 0.325), ('representation', 0.209), ('alters', 0.203), ('circuit', 0.203), ('answer', 0.186), ('internal', 0.14), ('bias', 0.11), ('output', 0.11), ('yes', 0.109), ('binary', 0.103), ('feeding', 0.102), ('predicting', 0.101), ('computation', 0.1), ('question', 0.095), ('prediction', 0.094), ('outputs', 0.094), ('caruana', 0.089), ('level', 0.085), ('sharing', 0.085), ('mathematically', 0.085), ('reuse', 0.085), ('classification', 0.083), ('altered', 0.081), ('infinite', 0.081), ('primitive', 0.081), ('biases', 0.078), ('function', 0.078), ('implies', 0.077), ('solving', 0.075), ('using', 0.075), ('feasible', 0.074), ('comparing', 0.074), ('however', 0.073), ('learning', 0.072), ('distinguish', 0.072), ('satisfying', 0.072), ('solve', 0.071), ('extraordinarily', 0.07), ('rich', 0.07), ('said', 0.07), ('done', 0.069), ('take', 0.067), ('learn', 0.067), ('radically', 0.066), ('limit', 0.064), ('fewer', 0.064), ('effects', 0.064), ('changing', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

2 0.32959321 164 hunch net-2006-03-17-Multitask learning is Black-Boxable

Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask

3 0.17056786 90 hunch net-2005-07-07-The Limits of Learning Theory

Introduction: Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning? The answer is “no”. This answer is both obvious and sometimes underappreciated. There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1 . Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must: Have a bias. Always predicting 0 is as likely as 1 is useless. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful. The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined wi

4 0.15299278 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.

5 0.15078394 161 hunch net-2006-03-05-“Structural” Learning

Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?

6 0.11045328 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

7 0.10199805 332 hunch net-2008-12-23-Use of Learning Theory

8 0.1009552 12 hunch net-2005-02-03-Learning Theory, by assumption

9 0.099688545 358 hunch net-2009-06-01-Multitask Poisoning

10 0.09852016 16 hunch net-2005-02-09-Intuitions from applied learning

11 0.091394432 6 hunch net-2005-01-27-Learning Complete Problems

12 0.08837498 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

13 0.088321447 14 hunch net-2005-02-07-The State of the Reduction

14 0.088074297 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

15 0.08799725 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

16 0.08523123 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.084837571 103 hunch net-2005-08-18-SVM Adaptability

18 0.084670052 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.083349988 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

20 0.080800161 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.197), (1, 0.115), (2, 0.008), (3, -0.018), (4, -0.004), (5, -0.053), (6, 0.047), (7, 0.023), (8, 0.025), (9, -0.059), (10, -0.138), (11, -0.063), (12, 0.013), (13, 0.04), (14, -0.055), (15, 0.044), (16, 0.004), (17, -0.037), (18, -0.031), (19, 0.031), (20, 0.006), (21, -0.009), (22, 0.128), (23, 0.041), (24, 0.037), (25, -0.121), (26, 0.065), (27, 0.018), (28, -0.046), (29, -0.009), (30, -0.031), (31, -0.028), (32, 0.0), (33, -0.124), (34, -0.151), (35, 0.065), (36, -0.038), (37, 0.134), (38, -0.124), (39, -0.097), (40, -0.037), (41, 0.006), (42, -0.052), (43, 0.071), (44, 0.051), (45, -0.182), (46, 0.037), (47, -0.087), (48, -0.071), (49, -0.111)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94666219 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

2 0.90458381 164 hunch net-2006-03-17-Multitask learning is Black-Boxable

3 0.68421638 90 hunch net-2005-07-07-The Limits of Learning Theory

4 0.65417624 161 hunch net-2006-03-05-“Structural” Learning

5 0.62669754 348 hunch net-2009-04-02-Asymmophobia

Introduction: One striking feature of many machine learning algorithms is the gymnastics that designers go through to avoid symmetry breaking. In the most basic form of machine learning, there are labeled examples composed of features. Each of these can be treated symmetrically or asymmetrically by algorithms. feature symmetry Every feature is treated the same. In gradient update rules, the same update is applied whether the feature is first or last. In metric-based predictions, every feature is just as important in computing the distance. example symmetry Every example is treated the same. Batch learning algorithms are great exemplars of this approach. label symmetry Every label is treated the same. This is particularly noticeable in multiclass classification systems which predict according to arg max l w l x but it occurs in many other places as well. Empirically, breaking symmetry well seems to yield great algorithms. feature asymmetry For those who like t

6 0.57646435 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

7 0.57050049 168 hunch net-2006-04-02-Mad (Neuro)science

8 0.5656504 253 hunch net-2007-07-06-Idempotent-capable Predictors

9 0.52769339 402 hunch net-2010-07-02-MetaOptimize

10 0.51804459 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

11 0.50350618 6 hunch net-2005-01-27-Learning Complete Problems

12 0.49375939 217 hunch net-2006-11-06-Data Linkage Problems

13 0.49164739 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

14 0.4869183 257 hunch net-2007-07-28-Asking questions

15 0.47291738 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

16 0.46975321 16 hunch net-2005-02-09-Intuitions from applied learning

17 0.46241817 68 hunch net-2005-05-10-Learning Reductions are Reductionist

18 0.46202052 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

19 0.45531392 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

20 0.45152581 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.015), (27, 0.227), (38, 0.048), (53, 0.093), (55, 0.136), (58, 0.286), (94, 0.063), (95, 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96292728 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

Introduction: Here are a few of presentations interesting me at the snowbird learning workshop (which, amusingly, was in Florida with AIStat ). Thomas Breuel described machine learning problems within OCR and an open source OCR software/research platform with modular learning components as well has a 60Million size dataset derived from Google ‘s scanned books. Kristen Grauman and Fei-Fei Li discussed using active learning with different cost labels and large datasets for image ontology . Both of them used Mechanical Turk as a labeling system , which looks to become routine, at least for vision problems. Russ Tedrake discussed using machine learning for control, with a basic claim that it was the way to go for problems involving a medium Reynold’s number such as in bird flight, where simulation is extremely intense. Yann LeCun presented a poster on an FPGA for convolutional neural networks yielding a factor of 100 speedup in processing. In addition to the graphi

2 0.89889187 470 hunch net-2012-07-17-MUCMD and BayLearn

Introduction: The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also. The first Bay Area Machine Learning Symposium is August 30 at Google . Abstracts are due July 30.

same-blog 3 0.86795771 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

4 0.85476577 342 hunch net-2009-02-16-KDNuggets

Introduction: Eric Zaetsch points out KDNuggets which is a well-developed mailing list/news site with a KDD flavor. This might particularly interest people looking for industrial jobs in machine learning, as the mailing list has many such.

5 0.70236886 437 hunch net-2011-07-10-ICML 2011 and the future

Introduction: Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper: Richard Socher , Cliff Lin , Andrew Y. Ng , and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks . I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists. the future Joelle and I are program chairs for ICML 2012 in Edinburgh , which I previously enjoyed visiting in 2005 . This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have: Colocation . The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGI

6 0.69640678 225 hunch net-2007-01-02-Retrospective

7 0.69123548 44 hunch net-2005-03-21-Research Styles in Machine Learning

8 0.69086868 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.68959337 116 hunch net-2005-09-30-Research in conferences

10 0.68873578 403 hunch net-2010-07-18-ICML & COLT 2010

11 0.68831956 297 hunch net-2008-04-22-Taking the next step

12 0.68787092 134 hunch net-2005-12-01-The Webscience Future

13 0.68782413 194 hunch net-2006-07-11-New Models

14 0.68667334 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

15 0.68623269 452 hunch net-2012-01-04-Why ICML? and the summer conferences

16 0.68499959 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

17 0.68440473 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

18 0.68370795 207 hunch net-2006-09-12-Incentive Compatible Reviewing

19 0.68357414 22 hunch net-2005-02-18-What it means to do research.

20 0.68303633 484 hunch net-2013-06-16-Representative Reviewing