hunch_net hunch_net-2005 hunch_net-2005-3 knowledge-graph by maker-knowledge-mining

3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning


meta infos for this blog

Source: html

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 All branches of machine learning seem to be united in the idea of using data to make predictions. [sent-1, score-0.373]

2 However, people disagree to some extent about what this means. [sent-2, score-0.15]

3 One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. [sent-3, score-1.69]

4 Here is my estimate of where various elements of machine learning fall on this spectrum. [sent-4, score-0.212]

5 Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc. [sent-5, score-0.821]

6 Reinforcement Learning The exact position of each element is of course debatable. [sent-8, score-0.321]

7 My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. [sent-9, score-1.363]

8 Bayesian/probabilistic models/graphical models generally require a human to sit and think about what is a good prior/structure. [sent-10, score-0.774]

9 Kernel learning approaches have a few standard kernels which often work on simple problems, although sometimes significant kernel engineering is required. [sent-11, score-0.351]

10 I’ve been impressed of late how ‘black box’ decision trees or boosted decision trees are. [sent-12, score-0.911]

11 The goal of reinforcement learning (rather than perhaps the reality) is designing completely automated agents. [sent-13, score-0.324]

12 The position in this spectrum provides some idea of what the state of progress is. [sent-14, score-0.34]

13 Things at the ‘human necessary’ end have been succesfully used by many people to solve many learning problems. [sent-15, score-0.243]

14 At the ‘human unnecessary’ end, the systems are finicky and often just won’t work well. [sent-16, score-0.099]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('human', 0.542), ('unnecessary', 0.282), ('visualization', 0.209), ('trees', 0.197), ('kernel', 0.186), ('position', 0.169), ('clustering', 0.156), ('necessary', 0.15), ('end', 0.149), ('models', 0.138), ('tools', 0.136), ('extreme', 0.122), ('decision', 0.116), ('data', 0.115), ('black', 0.113), ('reinforcement', 0.11), ('prediction', 0.105), ('axis', 0.105), ('boosted', 0.105), ('impressed', 0.105), ('finicky', 0.099), ('box', 0.094), ('branches', 0.094), ('sit', 0.094), ('spectrum', 0.094), ('succesfully', 0.094), ('kernels', 0.09), ('reasoning', 0.09), ('united', 0.087), ('disagree', 0.082), ('reality', 0.082), ('fall', 0.08), ('useless', 0.08), ('svm', 0.078), ('idea', 0.077), ('element', 0.076), ('exact', 0.076), ('completely', 0.075), ('engineering', 0.075), ('late', 0.075), ('aid', 0.073), ('designing', 0.072), ('goals', 0.069), ('graphical', 0.069), ('extent', 0.068), ('automated', 0.067), ('estimate', 0.067), ('nearly', 0.066), ('elements', 0.065), ('partially', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

2 0.14873375 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.

Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .

3 0.14729436 287 hunch net-2008-01-28-Sufficient Computation

Introduction: Do we have computer hardware sufficient for AI? This question is difficult to answer, but here’s a try: One way to achieve AI is by simulating a human brain. A human brain has about 10 15 synapses which operate at about 10 2 per second implying about 10 17 bit ops per second. A modern computer runs at 10 9 cycles/second and operates on 10 2 bits per cycle implying 10 11 bits processed per second. The gap here is only 6 orders of magnitude, which can be plausibly surpassed via cluster machines. For example, the BlueGene/L operates 10 5 nodes (one order of magnitude short). It’s peak recorded performance is about 0.5*10 15 FLOPS which translates to about 10 16 bit ops per second, which is nearly 10 17 . There are many criticisms (both positive and negative) for this argument. Simulation of a human brain might require substantially more detail. Perhaps an additional 10 2 is required per neuron. We may not need to simulate a human brain to achieve AI. Ther

4 0.1455791 237 hunch net-2007-04-02-Contextual Scaling

Introduction: Machine learning has a new kind of “scaling to larger problems” to worry about: scaling with the amount of contextual information. The standard development path for a machine learning application in practice seems to be the following: Marginal . In the beginning, there was “majority vote”. At this stage, it isn’t necessary to understand that you have a prediction problem. People just realize that one answer is right sometimes and another answer other times. In machine learning terms, this corresponds to making a prediction without side information. First context . A clever person realizes that some bit of information x 1 could be helpful. If x 1 is discrete, they condition on it and make a predictor h(x 1 ) , typically by counting. If they are clever, then they also do some smoothing. If x 1 is some real valued parameter, it’s very common to make a threshold cutoff. Often, these tasks are simply done by hand. Second . Another clever person (or perhaps the s

5 0.14144756 424 hunch net-2011-02-17-What does Watson mean?

Introduction: Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work. Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success. Alina invited me to the Jeopardy watching party

6 0.13260588 378 hunch net-2009-11-15-The Other Online Learning

7 0.12458495 131 hunch net-2005-11-16-The Everything Ensemble Edge

8 0.11921586 235 hunch net-2007-03-03-All Models of Learning have Flaws

9 0.11817691 39 hunch net-2005-03-10-Breaking Abstractions

10 0.11081503 201 hunch net-2006-08-07-The Call of the Deep

11 0.10711751 2 hunch net-2005-01-24-Holy grails of machine learning?

12 0.10637586 112 hunch net-2005-09-14-The Predictionist Viewpoint

13 0.10612175 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

14 0.10482782 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

15 0.10060982 194 hunch net-2006-07-11-New Models

16 0.096411906 370 hunch net-2009-09-18-Necessary and Sufficient Research

17 0.095221967 295 hunch net-2008-04-12-It Doesn’t Stop

18 0.093567505 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

19 0.091353983 188 hunch net-2006-06-30-ICML papers

20 0.090005971 347 hunch net-2009-03-26-Machine Learning is too easy


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.179), (1, 0.063), (2, -0.041), (3, 0.055), (4, 0.06), (5, -0.044), (6, -0.016), (7, 0.09), (8, 0.124), (9, -0.081), (10, -0.092), (11, -0.033), (12, -0.074), (13, -0.011), (14, -0.027), (15, 0.005), (16, 0.048), (17, -0.013), (18, 0.026), (19, -0.008), (20, 0.055), (21, -0.034), (22, -0.096), (23, 0.057), (24, 0.101), (25, 0.064), (26, -0.031), (27, -0.01), (28, 0.079), (29, -0.015), (30, 0.14), (31, -0.049), (32, -0.046), (33, -0.11), (34, 0.064), (35, 0.068), (36, 0.056), (37, -0.078), (38, 0.069), (39, -0.044), (40, 0.026), (41, -0.024), (42, -0.03), (43, 0.015), (44, 0.034), (45, 0.151), (46, 0.015), (47, 0.023), (48, 0.012), (49, -0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95350838 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

2 0.62744719 287 hunch net-2008-01-28-Sufficient Computation

Introduction: Do we have computer hardware sufficient for AI? This question is difficult to answer, but here’s a try: One way to achieve AI is by simulating a human brain. A human brain has about 10 15 synapses which operate at about 10 2 per second implying about 10 17 bit ops per second. A modern computer runs at 10 9 cycles/second and operates on 10 2 bits per cycle implying 10 11 bits processed per second. The gap here is only 6 orders of magnitude, which can be plausibly surpassed via cluster machines. For example, the BlueGene/L operates 10 5 nodes (one order of magnitude short). It’s peak recorded performance is about 0.5*10 15 FLOPS which translates to about 10 16 bit ops per second, which is nearly 10 17 . There are many criticisms (both positive and negative) for this argument. Simulation of a human brain might require substantially more detail. Perhaps an additional 10 2 is required per neuron. We may not need to simulate a human brain to achieve AI. Ther

3 0.56937635 424 hunch net-2011-02-17-What does Watson mean?

Introduction: Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work. Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success. Alina invited me to the Jeopardy watching party

4 0.55980158 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

5 0.5589717 32 hunch net-2005-02-27-Antilearning: When proximity goes bad

Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).

6 0.53269047 153 hunch net-2006-02-02-Introspectionism as a Disease

7 0.5249837 353 hunch net-2009-05-08-Computability in Artificial Intelligence

8 0.52178079 352 hunch net-2009-05-06-Machine Learning to AI

9 0.51366103 380 hunch net-2009-11-29-AI Safety

10 0.50569177 61 hunch net-2005-04-25-Embeddings: what are they good for?

11 0.50331581 95 hunch net-2005-07-14-What Learning Theory might do

12 0.49352315 2 hunch net-2005-01-24-Holy grails of machine learning?

13 0.47802147 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

14 0.46672922 112 hunch net-2005-09-14-The Predictionist Viewpoint

15 0.46524858 237 hunch net-2007-04-02-Contextual Scaling

16 0.4574708 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.

17 0.45739686 456 hunch net-2012-02-24-ICML+50%

18 0.45582968 120 hunch net-2005-10-10-Predictive Search is Coming

19 0.45519048 295 hunch net-2008-04-12-It Doesn’t Stop

20 0.44524103 131 hunch net-2005-11-16-The Everything Ensemble Edge


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.012), (21, 0.336), (27, 0.221), (38, 0.077), (48, 0.015), (53, 0.123), (55, 0.03), (94, 0.063), (95, 0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9205783 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for

2 0.81358445 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

same-blog 3 0.81219643 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

4 0.81001854 369 hunch net-2009-08-27-New York Area Machine Learning Events

Introduction: Several events are happening in the NY area. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence ) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions. The Singularity Summit October 3&4 . This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT . Predictive Analytics World, Oct 20-21 . This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable.

5 0.76078027 267 hunch net-2007-10-17-Online as the new adjective

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

6 0.68746156 378 hunch net-2009-11-15-The Other Online Learning

7 0.62213182 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

8 0.62028211 131 hunch net-2005-11-16-The Everything Ensemble Edge

9 0.6193949 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

10 0.6165148 194 hunch net-2006-07-11-New Models

11 0.61309612 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

12 0.6096065 14 hunch net-2005-02-07-The State of the Reduction

13 0.60796368 12 hunch net-2005-02-03-Learning Theory, by assumption

14 0.6067788 201 hunch net-2006-08-07-The Call of the Deep

15 0.60534072 19 hunch net-2005-02-14-Clever Methods of Overfitting

16 0.60514581 41 hunch net-2005-03-15-The State of Tight Bounds

17 0.60373235 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

18 0.59939957 370 hunch net-2009-09-18-Necessary and Sufficient Research

19 0.59923321 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

20 0.59886384 347 hunch net-2009-03-26-Machine Learning is too easy