hunch_net hunch_net-2007 hunch_net-2007-277 knowledge-graph by maker-knowledge-mining

277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

meta infos for this blog

Source: html

Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. [sent-1, score-0.196]

2 ” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. [sent-3, score-0.37]

3 When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. [sent-4, score-0.642]

4 This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. [sent-6, score-0.396]

5 The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. [sent-7, score-0.769]

6 For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. [sent-8, score-0.328]

7 Here are some answers to “where do the labels come from? [sent-9, score-0.477]

8 ”: Simulation Use a simulator (which need not be that good) to predict the cost of various choices and turn that into label information. [sent-10, score-0.243]

9 Luis often used an agreement mechanism to induce labels with games. [sent-14, score-0.61]

10 Sham discussed the power of agreement to constrain learning algorithms. [sent-15, score-0.462]

11 Huzefa ‘s work on bioprediction can be thought of as partly using agreement with previous structures to simulate the label of a new structure. [sent-16, score-0.562]

12 Some answers to “where do the data come from” are: Everywhere The essential idea is to integrate as many data sources as possible. [sent-20, score-0.506]

13 Rakesh had several algorithms which (in combination) allowed him to use a large number of diverse data sources in a text domain. [sent-21, score-0.203]

14 Sparsity A representation is formed by finding a sparse set of basis functions on otherwise totally unlabeled data. [sent-22, score-0.401]

15 Self-prediction A representation is formed by learning to self-predict a set of raw features. [sent-24, score-0.323]

16 A workshop like this is successful if it informs the questions we ask (and answer) in the future. [sent-26, score-0.384]

17 Some natural questions (some of which were discussed) are: What is a natural, sufficient langauge for adding prior information into a learning system? [sent-27, score-0.224]

18 Shai described a sense in which kernels are insufficient as a language for prior information. [sent-29, score-0.364]

19 Bayesian analysis emphasizes reasoning about the parameters of the model, but the language of examples or maybe label expectations may be more natural. [sent-30, score-0.321]

20 The other approaches and questions are essentially unexplored territory where some serious thinking may be helpful. [sent-39, score-0.487]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('labels', 0.281), ('agreement', 0.245), ('workshop', 0.196), ('compilation', 0.189), ('hired', 0.189), ('modularize', 0.189), ('insufficient', 0.168), ('label', 0.165), ('formed', 0.147), ('approaches', 0.132), ('lists', 0.13), ('discussed', 0.123), ('prior', 0.114), ('questions', 0.11), ('data', 0.107), ('answers', 0.105), ('labeled', 0.1), ('sources', 0.096), ('power', 0.094), ('come', 0.091), ('set', 0.089), ('representation', 0.087), ('induce', 0.084), ('gregory', 0.084), ('compiling', 0.084), ('signals', 0.084), ('ambitious', 0.084), ('backprop', 0.084), ('backpropagation', 0.084), ('bradley', 0.084), ('territory', 0.084), ('serious', 0.083), ('language', 0.082), ('simulator', 0.078), ('synthesize', 0.078), ('unexplored', 0.078), ('simulate', 0.078), ('diversity', 0.078), ('totally', 0.078), ('informs', 0.078), ('simulation', 0.078), ('emphasizes', 0.074), ('rajat', 0.074), ('somewhere', 0.074), ('structures', 0.074), ('formalize', 0.074), ('talk', 0.073), ('luis', 0.07), ('demos', 0.07), ('everywhere', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

2 0.17460634 45 hunch net-2005-03-22-Active learning

Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (

3 0.16469029 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

Introduction: At NIPS, Andrew Ng asked me what should be in a large scale learning class. After some discussion with him and Nando and mulling it over a bit, these are the topics that I think should be covered. There are many different kinds of scaling. Scaling in examples This is the most basic kind of scaling. Online Gradient Descent This is an old algorithm—I’m not sure if anyone can be credited with it in particular. Perhaps the Perceptron is a good precursor, but substantial improvements come from the notion of a loss function of which squared loss , logistic loss , Hinge Loss, and Quantile Loss are all worth covering. It’s important to cover the semantics of these loss functions as well. Vowpal Wabbit is a reasonably fast codebase implementing these. Second Order Gradient Descent methods For some problems, methods taking into account second derivative information can be more effective. I’ve seen preconditioned conjugate gradient work well, for which Jonath

4 0.13054655 420 hunch net-2010-12-26-NIPS 2010

Introduction: I enjoyed attending NIPS this year, with several things interesting me. For the conference itself: Peter Welinder , Steve Branson , Serge Belongie , and Pietro Perona , The Multidimensional Wisdom of Crowds . This paper is about using mechanical turk to get label information, with results superior to a majority vote approach. David McAllester , Tamir Hazan , and Joseph Keshet Direct Loss Minimization for Structured Prediction . This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades . This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested. Alan Fern and Prasad Tadepalli , A Computational Decision Theory for Interactive Assistants . This paper carves out some

5 0.12502326 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

Introduction: Alina and I are organizing a workshop on Learning Problem Design at NIPS . What is learning problem design? Itâ€™s about being clever in creating learning problems from otherwise unlabeled data. Read the webpage above for examples. I want to participate! Email us before Nov. 1 with a description of what you want to talk about.

6 0.12448379 84 hunch net-2005-06-22-Languages of Learning

7 0.11796457 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

8 0.11640944 165 hunch net-2006-03-23-The Approximation Argument

9 0.11351155 143 hunch net-2005-12-27-Automated Labeling

10 0.11330264 235 hunch net-2007-03-03-All Models of Learning have Flaws

11 0.11232974 234 hunch net-2007-02-22-Create Your Own ICML Workshop

12 0.11018376 444 hunch net-2011-09-07-KDD and MUCMD 2011

13 0.10751364 20 hunch net-2005-02-15-ESPgame and image labeling

14 0.10428824 351 hunch net-2009-05-02-Wielding a New Abstraction

15 0.10154887 237 hunch net-2007-04-02-Contextual Scaling

16 0.098103233 309 hunch net-2008-07-10-Interesting papers, ICML 2008

17 0.097430393 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

18 0.097315885 332 hunch net-2008-12-23-Use of Learning Theory

19 0.097161345 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition

20 0.096890196 141 hunch net-2005-12-17-Workshops as Franchise Conferences

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.242), (1, 0.064), (2, -0.091), (3, -0.042), (4, 0.098), (5, 0.067), (6, 0.014), (7, 0.022), (8, 0.106), (9, -0.05), (10, -0.057), (11, -0.064), (12, -0.047), (13, 0.101), (14, -0.038), (15, -0.094), (16, -0.077), (17, 0.084), (18, -0.137), (19, -0.017), (20, -0.028), (21, -0.109), (22, 0.066), (23, 0.021), (24, 0.104), (25, 0.019), (26, 0.022), (27, 0.047), (28, 0.047), (29, -0.001), (30, -0.033), (31, 0.0), (32, 0.022), (33, 0.049), (34, -0.074), (35, -0.013), (36, -0.04), (37, 0.03), (38, -0.116), (39, -0.032), (40, 0.089), (41, -0.001), (42, 0.03), (43, -0.172), (44, 0.046), (45, -0.016), (46, 0.001), (47, -0.023), (48, 0.043), (49, -0.004)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95535892 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

2 0.70793951 444 hunch net-2011-09-07-KDD and MUCMD 2011

Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example

3 0.63095367 114 hunch net-2005-09-20-Workshop Proposal: Atomic Learning

Introduction: This is a proposal for a workshop. It may or may not happen depending on the level of interest. If you are interested, feel free to indicate so (by email or comments). Description: Assume(*) that any system for solving large difficult learning problems must decompose into repeated use of basic elements (i.e. atoms). There are many basic questions which remain: What are the viable basic elements? What makes a basic element viable? What are the viable principles for the composition of these basic elements? What are the viable principles for learning in such systems? What problems can this approach handle? Hal Daume adds: Can composition of atoms be (semi-) automatically constructed[?] When atoms are constructed through reductions, is there some notion of the “naturalness” of the created leaning problems? Other than Markov fields/graphical models/Bayes nets, is there a good language for representing atoms and their compositions? The answer to these a

4 0.61711287 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

5 0.59121233 143 hunch net-2005-12-27-Automated Labeling

Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict

6 0.578462 319 hunch net-2008-10-01-NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’

7 0.5648278 45 hunch net-2005-03-22-Active learning

8 0.55939466 420 hunch net-2010-12-26-NIPS 2010

9 0.55447435 161 hunch net-2006-03-05-“Structural” Learning

10 0.55387062 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop

11 0.55047935 84 hunch net-2005-06-22-Languages of Learning

12 0.53467929 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

13 0.53404009 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

14 0.51698005 217 hunch net-2006-11-06-Data Linkage Problems

15 0.50851578 234 hunch net-2007-02-22-Create Your Own ICML Workshop

16 0.50200069 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

17 0.49831894 144 hunch net-2005-12-28-Yet more nips thoughts

18 0.49044007 237 hunch net-2007-04-02-Contextual Scaling

19 0.47717127 321 hunch net-2008-10-19-NIPS 2008 workshop on Kernel Learning

20 0.47695896 235 hunch net-2007-03-03-All Models of Learning have Flaws

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.02), (27, 0.222), (34, 0.02), (38, 0.082), (53, 0.055), (55, 0.057), (64, 0.335), (77, 0.014), (94, 0.085), (95, 0.036)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95846474 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition

Introduction: Francisco Pereira points out a fun Prediction Competition . Francisco says: DARPA is sponsoring a competition to analyze data from an unusual functional Magnetic Resonance Imaging experiment. Subjects watch videos inside the scanner while fMRI data are acquired. Unbeknownst to these subjects, the videos have been seen by a panel of other subjects that labeled each instant with labels in categories such as representation (are there tools, body parts, motion, sound), location, presence of actors, emotional content, etc. The challenge is to predict all of these different labels on an instant-by-instant basis from the fMRI data. A few reasons why this is particularly interesting: This is beyond the current state of the art, but not inconceivably hard. This is a new type of experiment design current analysis methods cannot deal with. This is an opportunity to work with a heavily examined and preprocessed neuroimaging dataset. DARPA is offering prizes!

2 0.930767 442 hunch net-2011-08-20-The Large Scale Learning Survey Tutorial

Introduction: Ron Bekkerman initiated an effort to create an edited book on parallel machine learning that Misha and I have been helping with. The breadth of efforts to parallelize machine learning surprised me: I was only aware of a small fraction initially. This put us in a unique position, with knowledge of a wide array of different efforts, so it is natural to put together a survey tutorial on the subject of parallel learning for KDD , tomorrow. This tutorial is not limited to the book itself however, as several interesting new algorithms have come out since we started inviting chapters. This tutorial should interest anyone trying to use machine learning on significant quantities of data, anyone interested in developing algorithms for such, and of course who has bragging rights to the fastest learning algorithm on planet earth (Also note the Modeling with Hadoop tutorial just before ours which deals with one way of trying to speed up learning algorithms. We have almost no

3 0.88990605 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

same-blog 4 0.88228905 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

5 0.8782267 420 hunch net-2010-12-26-NIPS 2010

6 0.87208903 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

7 0.78108585 291 hunch net-2008-03-07-Spock Challenge Winners

8 0.66713655 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.64932323 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

10 0.64485478 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

11 0.64433789 351 hunch net-2009-05-02-Wielding a New Abstraction

12 0.64330053 131 hunch net-2005-11-16-The Everything Ensemble Edge

13 0.63152891 26 hunch net-2005-02-21-Problem: Cross Validation

14 0.63010013 360 hunch net-2009-06-15-In Active Learning, the question changes

15 0.62617564 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

16 0.62587279 424 hunch net-2011-02-17-What does Watson mean?

17 0.62086427 194 hunch net-2006-07-11-New Models

18 0.61904144 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

19 0.61681026 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

20 0.61608189 432 hunch net-2011-04-20-The End of the Beginning of Active Learning