hunch_net hunch_net-2006 hunch_net-2006-169 knowledge-graph by maker-knowledge-mining

169 hunch net-2006-04-05-What is state?

meta infos for this blog

Source: html

Introduction: In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. Newtonian State. State is the physical pose of the world. Under this definition, there are very many states, often too many for explicit representation. This is also the definition typically used in games. Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?” Mathematical State. State is a sufficient stati

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In reinforcement learning (and sometimes other settings), there is a notion of “state”. [sent-1, score-0.1]

2 Based upon the state various predictions are made such as “Which action should be taken next? [sent-2, score-0.993]

3 ” or “How much cumulative reward do I expect if I take some action from this state? [sent-3, score-0.335]

4 ” Given the importance of state, it is important to examine the meaning. [sent-4, score-0.185]

5 There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. [sent-5, score-0.693]

6 Under this definition, there are very many states, often too many for explicit representation. [sent-8, score-0.057]

7 This is also the definition typically used in games. [sent-9, score-0.148]

8 State is an abstracted physical state of the world. [sent-11, score-1.068]

9 A basic issue here is: “How do you compute the state from observations? [sent-15, score-0.83]

10 State is a sufficient statistic of observations for making necessary predictions. [sent-17, score-0.407]

11 State is the internal belief/understanding/etc… which changes an agent’s actions in different circumstances. [sent-19, score-0.391]

12 ” This is like the mathematical version of state, except that portions of the statistic which can not be learned are irrelevant. [sent-21, score-0.434]

13 There are only observations (one of which might be a reward) and actions. [sent-23, score-0.165]

14 This is more reasonable than it might sound because state is a derived quantity and the necessity of that derivation is unclear. [sent-24, score-0.994]

15 The different questions these notions of state motivate can have large practical consequences on the algorithms used. [sent-26, score-1.037]

16 It is not clear yet what the “right” notion of state is—we just don’t know what works well. [sent-27, score-0.824]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('state', 0.724), ('internal', 0.201), ('abstracted', 0.194), ('statistic', 0.194), ('observations', 0.165), ('physical', 0.15), ('definition', 0.148), ('reward', 0.121), ('states', 0.117), ('action', 0.117), ('notion', 0.1), ('mathematical', 0.099), ('newtonian', 0.097), ('cumulative', 0.097), ('door', 0.097), ('portions', 0.09), ('notions', 0.085), ('motivating', 0.085), ('pose', 0.081), ('variation', 0.081), ('consequences', 0.078), ('closed', 0.078), ('motivate', 0.075), ('examine', 0.075), ('options', 0.075), ('derived', 0.075), ('different', 0.075), ('necessity', 0.073), ('agent', 0.069), ('distinct', 0.067), ('quantity', 0.064), ('actions', 0.063), ('moment', 0.061), ('settings', 0.059), ('compute', 0.058), ('sound', 0.058), ('pieces', 0.057), ('explicit', 0.057), ('smaller', 0.057), ('important', 0.056), ('room', 0.056), ('importance', 0.054), ('taken', 0.052), ('changes', 0.052), ('upon', 0.051), ('except', 0.051), ('predictions', 0.049), ('turns', 0.049), ('sufficient', 0.048), ('issue', 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 169 hunch net-2006-04-05-What is state?

2 0.17329448 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

3 0.17006955 183 hunch net-2006-06-14-Explorations of Exploration

Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and

4 0.15216465 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a

5 0.14836198 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

Introduction: Suppose we have a set of observations over time x 1 ,x 2 ,…,x t and want to predict some future event y t+1 . An inevitable problem arises, because learning a predictor h(x 1 ,…,x t ) of y t+1 is generically intractable due to the size of the input. To make this problem tractable, what’s necessary is a method for summarizing the relevant information in past observations for the purpose of prediction in the future. In other words, state is required. Existing approaches for deriving state have some limitations. Hidden Markov models learned with EM suffer from local minima, use tabular learning approaches which provide dubious generalization ability, and often require substantial a.priori specification of the observations. Kalman Filters and Particle Filters are very parametric in the sense that substantial information must be specified up front. Dynamic Bayesian Networks ( graphical models through time) require substantial a.priori specification and often re

6 0.092246234 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

7 0.089794397 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

8 0.086635426 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

9 0.086073235 95 hunch net-2005-07-14-What Learning Theory might do

10 0.082740545 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

11 0.082603626 2 hunch net-2005-01-24-Holy grails of machine learning?

12 0.080809243 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

13 0.080217861 406 hunch net-2010-08-22-KDD 2010

14 0.080054916 370 hunch net-2009-09-18-Necessary and Sufficient Research

15 0.078579754 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

16 0.074592993 454 hunch net-2012-01-30-ICML Posters and Scope

17 0.069908157 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

18 0.068686873 343 hunch net-2009-02-18-Decision by Vetocracy

19 0.067473054 123 hunch net-2005-10-16-Complexity: It’s all in your head

20 0.067190528 255 hunch net-2007-07-13-The View From China

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.129), (1, 0.049), (2, -0.01), (3, 0.041), (4, 0.04), (5, -0.049), (6, 0.044), (7, 0.061), (8, 0.051), (9, 0.009), (10, 0.026), (11, -0.017), (12, 0.013), (13, 0.115), (14, -0.046), (15, 0.012), (16, 0.013), (17, -0.048), (18, 0.07), (19, 0.035), (20, -0.07), (21, 0.041), (22, -0.042), (23, 0.059), (24, -0.114), (25, 0.097), (26, -0.157), (27, -0.051), (28, -0.09), (29, 0.064), (30, 0.027), (31, -0.024), (32, -0.005), (33, -0.018), (34, -0.02), (35, -0.005), (36, 0.099), (37, 0.006), (38, 0.051), (39, -0.086), (40, 0.011), (41, 0.002), (42, 0.096), (43, 0.062), (44, -0.021), (45, 0.046), (46, -0.022), (47, 0.024), (48, -0.047), (49, -0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99014288 169 hunch net-2006-04-05-What is state?

2 0.65968555 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

3 0.60232985 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

4 0.54734951 388 hunch net-2010-01-24-Specializations of the Master Problem

5 0.54461884 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work

6 0.54335493 183 hunch net-2006-06-14-Explorations of Exploration

7 0.51509088 269 hunch net-2007-10-24-Contextual Bandits

8 0.5035457 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

9 0.48965704 276 hunch net-2007-12-10-Learning Track of International Planning Competition

10 0.4854317 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

11 0.47475764 148 hunch net-2006-01-13-Benchmarks for RL

12 0.47418675 220 hunch net-2006-11-27-Continuizing Solutions

13 0.46881977 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

14 0.44817194 153 hunch net-2006-02-02-Introspectionism as a Disease

15 0.43825448 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

16 0.4380129 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

17 0.42745921 289 hunch net-2008-02-17-The Meaning of Confidence

18 0.41854984 400 hunch net-2010-06-13-The Good News on Exploration and Learning

19 0.4045043 2 hunch net-2005-01-24-Holy grails of machine learning?

20 0.38240474 303 hunch net-2008-06-09-The Minimum Sample Complexity of Importance Weighting

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.027), (14, 0.024), (27, 0.111), (48, 0.02), (52, 0.219), (53, 0.106), (55, 0.151), (69, 0.013), (77, 0.052), (79, 0.036), (94, 0.086), (95, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9252398 169 hunch net-2006-04-05-What is state?

2 0.74449939 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

3 0.67626864 116 hunch net-2005-09-30-Research in conferences

Introduction: Conferences exist as part of the process of doing research. They provide many roles including “announcing research”, “meeting people”, and “point of reference”. Not all conferences are alike so a basic question is: “to what extent do individual conferences attempt to aid research?” This question is very difficult to answer in any satisfying way. What we can do is compare details of the process across multiple conferences. Comments The average quality of comments across conferences can vary dramatically. At one extreme, the tradition in CS theory conferences is to provide essentially zero feedback. At the other extreme, some conferences have a strong tradition of providing detailed constructive feedback. Detailed feedback can give authors significant guidance about how to improve research. This is the most subjective entry. Blind Virtually all conferences offer single blind review where authors do not know reviewers. Some also provide double blind review where rev

4 0.67442495 423 hunch net-2011-02-02-User preferences for search engines

Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later

5 0.66347897 40 hunch net-2005-03-13-Avoiding Bad Reviewing

Introduction: If we accept that bad reviewing often occurs and want to fix it, the question is “how”? Reviewing is done by paper writers just like yourself, so a good proxy for this question is asking “How can I be a better reviewer?” Here are a few things I’ve learned by trial (and error), as a paper writer, and as a reviewer. The secret ingredient is careful thought. There is no good substitution for a deep and careful understanding. Avoid reviewing papers that you feel competitive about. You almost certainly will be asked to review papers that feel competitive if you work on subjects of common interest. But, the feeling of competition can easily lead to bad judgement. If you feel biased for some other reason, then you should avoid reviewing. For example… Feeling angry or threatened by a paper is a form of bias. See above. Double blind yourself (avoid looking at the name even in a single-blind situation). The significant effect of a name you recognize is making you pay close a

6 0.66321075 452 hunch net-2012-01-04-Why ICML? and the summer conferences

7 0.66115725 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

8 0.65457278 141 hunch net-2005-12-17-Workshops as Franchise Conferences

9 0.65032095 437 hunch net-2011-07-10-ICML 2011 and the future

10 0.64887041 151 hunch net-2006-01-25-1 year

11 0.64555109 395 hunch net-2010-04-26-Compassionate Reviewing

12 0.64274853 461 hunch net-2012-04-09-ICML author feedback is open

13 0.63899618 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

14 0.63801718 96 hunch net-2005-07-21-Six Months

15 0.63389146 484 hunch net-2013-06-16-Representative Reviewing

16 0.6329897 453 hunch net-2012-01-28-Why COLT?

17 0.63225961 270 hunch net-2007-11-02-The Machine Learning Award goes to …

18 0.62829345 454 hunch net-2012-01-30-ICML Posters and Scope

19 0.62668371 410 hunch net-2010-09-17-New York Area Machine Learning Events

20 0.62646961 416 hunch net-2010-10-29-To Vidoelecture or not