hunch_net hunch_net-2006 hunch_net-2006-148 knowledge-graph by maker-knowledge-mining

148 hunch net-2006-01-13-Benchmarks for RL


meta infos for this blog

Source: html

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. [sent-1, score-0.997]

2 There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. [sent-6, score-0.417]

3 Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. [sent-8, score-0.825]

4 A well run prediction competition provides some of the best evidence available about which learning algorithms work well. [sent-11, score-0.302]

5 There are still some mechanisms for being misled (too many entries, not enough testing data, unlucky choice of learning problem that your algorithm just has the wrong bias for), but they are more controlled than anywhere else. [sent-12, score-0.525]

6 Typically, there is little reason to believe that simulated data reflects the structure of real-world problems. [sent-18, score-0.477]

7 The repository enforced a certain interface which many natural learning problems did not fit into. [sent-20, score-0.898]

8 Some interface must be enforced in order for a benchmark/repository to be useful. [sent-21, score-0.287]

9 This implies that problems fitting the interface are preferred both in submission of new problems and in choice of algorithms to solve. [sent-22, score-0.658]

10 This is a reasonable complaint, but it’s basically a complaint that the UCI machine learning repository was too succesful. [sent-23, score-0.592]

11 It is important to consider the benefits of a benchmark suite as well. [sent-26, score-0.49]

12 The standard in reinforcement learning for experimental studies in reinforcement learning algorithms has been showing that some new algorithm does something reasonable on one or two reinforcement learning problems. [sent-27, score-1.301]

13 Naturally, what this implies in practice is that the algorithms were hand tuned to work on these one-or-two problems, and there was relatively little emphasis on building a general purpose reinforcement learning algorithm. [sent-28, score-0.733]

14 This is strongly at odds with the end goal of the reinforcement learning community! [sent-29, score-0.48]

15 The way to avoid this is by setting up a benchmark suite (the more problems, the better). [sent-30, score-0.49]

16 With a large set of problems that people can easily test on, the natural emphasis in empirical algorithm development shifts towards general purpose algorithms. [sent-31, score-0.716]

17 When people reimplement simulators for reinforcement learning problems, it turns out these reimplementations typically differ in the details, and these details can radically alter the difficulty of the problem. [sent-33, score-0.474]

18 Again, a large set of problems people can easily test on solves this problem because the precise definition of the problem is shared. [sent-35, score-0.366]

19 One sometimes reasonable view of the world is that if you define a problem publicly, it is as good as solved since there are plenty of smart people out there eager to prove themselves. [sent-36, score-0.325]

20 According to this view of the world, setting up a benchmark is a vital task. [sent-37, score-0.361]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('repository', 0.28), ('benchmark', 0.28), ('reinforcement', 0.257), ('simulated', 0.255), ('suite', 0.21), ('uci', 0.182), ('misled', 0.17), ('problems', 0.165), ('benchmarks', 0.151), ('interface', 0.147), ('competitions', 0.14), ('enforced', 0.14), ('complaint', 0.132), ('algorithms', 0.12), ('emphasis', 0.107), ('evidence', 0.101), ('reasonable', 0.099), ('purpose', 0.089), ('natural', 0.085), ('project', 0.085), ('setup', 0.084), ('learning', 0.081), ('view', 0.081), ('little', 0.079), ('strongly', 0.079), ('data', 0.077), ('rlbench', 0.076), ('touch', 0.076), ('continued', 0.076), ('eager', 0.076), ('hill', 0.076), ('unlucky', 0.076), ('empirical', 0.073), ('sanity', 0.07), ('webserver', 0.07), ('car', 0.07), ('kept', 0.07), ('reimplement', 0.07), ('problem', 0.069), ('algorithm', 0.068), ('details', 0.066), ('shifts', 0.066), ('announce', 0.066), ('criticisms', 0.066), ('luckily', 0.066), ('overfit', 0.066), ('reflects', 0.066), ('test', 0.063), ('goal', 0.063), ('choice', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999934 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

2 0.16976093 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work

3 0.13297714 183 hunch net-2006-06-14-Explorations of Exploration

Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and

4 0.13082638 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

5 0.12282889 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

Introduction: There are several different flavors of Machine Learning classes. Many classes are of the ‘zoo’ sort: many different learning algorithms are presented. Others avoid the zoo by not covering the full scope of machine learning. This is my view of what makes a good machine learning class, along with why. I’d like to specifically invite comment on whether things are missing, misemphasized, or misplaced. Phase Subject Why? Introduction What is a machine learning problem? A good understanding of the characteristics of machine learning problems seems essential. Characteristics include: a data source, some hope the data is predictive, and a need for generalization. This is probably best taught in a case study manner: lay out the specifics of some problem and then ask “Is this a machine learning problem?” Introduction Machine Learning Problem Identification Identification and recognition of the type of learning problems is (obviously) a very important step i

6 0.12130676 454 hunch net-2012-01-30-ICML Posters and Scope

7 0.11782699 388 hunch net-2010-01-24-Specializations of the Master Problem

8 0.11706179 347 hunch net-2009-03-26-Machine Learning is too easy

9 0.11321969 19 hunch net-2005-02-14-Clever Methods of Overfitting

10 0.11269147 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

11 0.11265998 268 hunch net-2007-10-19-Second Annual Reinforcement Learning Competition

12 0.11014488 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

13 0.10977628 400 hunch net-2010-06-13-The Good News on Exploration and Learning

14 0.10799025 160 hunch net-2006-03-02-Why do people count for learning?

15 0.10669062 235 hunch net-2007-03-03-All Models of Learning have Flaws

16 0.10563168 22 hunch net-2005-02-18-What it means to do research.

17 0.10424498 2 hunch net-2005-01-24-Holy grails of machine learning?

18 0.096863501 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

19 0.096578084 276 hunch net-2007-12-10-Learning Track of International Planning Competition

20 0.092985086 120 hunch net-2005-10-10-Predictive Search is Coming


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.254), (1, 0.073), (2, -0.046), (3, 0.034), (4, 0.028), (5, -0.048), (6, 0.014), (7, 0.039), (8, 0.014), (9, 0.003), (10, -0.041), (11, 0.026), (12, 0.018), (13, 0.075), (14, -0.017), (15, 0.035), (16, 0.055), (17, -0.115), (18, -0.048), (19, 0.034), (20, 0.005), (21, -0.024), (22, -0.039), (23, -0.064), (24, -0.067), (25, -0.001), (26, -0.024), (27, -0.037), (28, -0.028), (29, -0.004), (30, -0.04), (31, -0.015), (32, -0.006), (33, -0.018), (34, 0.009), (35, -0.024), (36, 0.105), (37, 0.027), (38, 0.114), (39, -0.038), (40, 0.019), (41, 0.029), (42, 0.003), (43, 0.063), (44, 0.021), (45, 0.096), (46, 0.028), (47, 0.022), (48, 0.016), (49, 0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94003916 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

2 0.75121826 276 hunch net-2007-12-10-Learning Track of International Planning Competition

Introduction: The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more information you can go to the competition web-site . The competitions are typically organized around a number of planning domains that can vary from year to year, where a planning domain is simply a class of problems that share a common action schema—e.g. Blocksworld is a well-known planning domain that contains a problem instance each possible initial tower configuration and goal configuration. Some other domains have included Logistics, Airport, Freecell, PipesWorld, and many others . For each domain the competition includes a number of problems (say 40-50) and the planners are run on each problem with a time limit for each problem (around 30 minutes). The problems are hard enough that many problems are not solved within th

3 0.70881706 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

4 0.69816279 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

5 0.69557065 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

6 0.69140202 347 hunch net-2009-03-26-Machine Learning is too easy

7 0.6830157 183 hunch net-2006-06-14-Explorations of Exploration

8 0.68183345 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

9 0.68165594 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

10 0.67974442 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

11 0.67762512 104 hunch net-2005-08-22-Do you believe in induction?

12 0.6766749 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

13 0.66715729 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

14 0.66623676 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

15 0.65703398 388 hunch net-2010-01-24-Specializations of the Master Problem

16 0.65581769 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

17 0.64995104 454 hunch net-2012-01-30-ICML Posters and Scope

18 0.64329839 28 hunch net-2005-02-25-Problem: Online Learning

19 0.64018488 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

20 0.63660014 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.015), (10, 0.03), (25, 0.315), (27, 0.198), (30, 0.019), (38, 0.024), (49, 0.028), (53, 0.056), (55, 0.124), (68, 0.015), (77, 0.013), (94, 0.089)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.88016254 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

same-blog 2 0.82894993 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

3 0.80314118 163 hunch net-2006-03-12-Online learning or online preservation of learning?

Introduction: In the online learning with experts setting, you observe a set of predictions, make a decision, and then observe the truth. This process repeats indefinitely. In this setting, it is possible to prove theorems of the sort: master algorithm error count < = k* best predictor error count + c*log(number of predictors) Is this a statement about learning or about preservation of learning? We did some experiments to analyze the new Binning algorithm which works in this setting. For several UCI datasets, we reprocessed them so that features could be used as predictors and then applied several master algorithms. The first graph confirms that Binning is indeed a better algorithm according to the tightness of the upper bound. Here, “Best” is the performance of the best expert. “V. Bound” is the bound for Vovk ‘s algorithm (the previous best). “Bound” is the bound for the Binning algorithm. “Binning” is the performance of the Binning algorithm. The Binning algorithm clearly h

4 0.76369196 134 hunch net-2005-12-01-The Webscience Future

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield

5 0.61804891 437 hunch net-2011-07-10-ICML 2011 and the future

Introduction: Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper: Richard Socher , Cliff Lin , Andrew Y. Ng , and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks . I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists. the future Joelle and I are program chairs for ICML 2012 in Edinburgh , which I previously enjoyed visiting in 2005 . This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have: Colocation . The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGI

6 0.61319578 40 hunch net-2005-03-13-Avoiding Bad Reviewing

7 0.61126435 343 hunch net-2009-02-18-Decision by Vetocracy

8 0.61019874 454 hunch net-2012-01-30-ICML Posters and Scope

9 0.60976434 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

10 0.60913467 96 hunch net-2005-07-21-Six Months

11 0.60823429 44 hunch net-2005-03-21-Research Styles in Machine Learning

12 0.60771322 95 hunch net-2005-07-14-What Learning Theory might do

13 0.60707235 297 hunch net-2008-04-22-Taking the next step

14 0.60695642 207 hunch net-2006-09-12-Incentive Compatible Reviewing

15 0.60675555 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)

16 0.60612303 51 hunch net-2005-04-01-The Producer-Consumer Model of Research

17 0.60580623 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

18 0.60538375 116 hunch net-2005-09-30-Research in conferences

19 0.60508204 225 hunch net-2007-01-02-Retrospective

20 0.60392636 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006