nips nips2003 nips2003-14 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew R. Rudary, Satinder P. Singh
Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. [sent-3, score-0.306]
2 One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. [sent-4, score-0.178]
3 We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. [sent-6, score-0.294]
4 These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. [sent-7, score-0.221]
5 A test is a sequence of action-observation pairs and the prediction for a test is the probability of the test-observations happening if the test-actions are executed. [sent-9, score-0.071]
6 All PSRs studied to date have been linear, in the sense that the probability of any sequence of k observations given a sequence of k actions can be expressed as a linear function of the predictions of a core set of tests. [sent-14, score-0.278]
7 We introduce a new type of test, the e-test, and present the first nonlinear PSR that can be applied to a general class of dynamical systems. [sent-15, score-0.146]
8 In particular, in the first such result for PSRs we show that there exist controlled dynamical systems whose PSR representation is exponentially smaller than its POMDP representation. [sent-16, score-0.241]
9 2 Models of Dynamical Systems A model of a controlled dynamical system defines a probability distribution over sequences of observations one would get for any sequence of actions one could execute in the system. [sent-18, score-0.233]
10 Equivalently, given any history of interaction with the dynamical system so far, a model defines the distribution over sequences of future observations for all sequences of future actions. [sent-19, score-0.252]
11 POMDPs [3, 4] and PSRs [2] both model controlled dynamical systems. [sent-21, score-0.131]
12 In POMDPs, a belief state is used to encode historical information; in PSRs, probabilities of particular future outcomes are used. [sent-22, score-0.066]
13 T is a set of matrices a of dimension |S| × |S|, one for each a ∈ A, such that Tij is the probability that the next state is j given that the current state is i and action a is taken. [sent-25, score-0.107]
14 O is a set of |S| × |S| a,o diagonal matrices, one for each action-observation pair, such that Oii is the probability of observing o after arriving in state i by taking action a. [sent-26, score-0.072]
15 Finally, b0 is the initial belief state, a |S| × 1 vector whose ith element is the probability of the system starting in state i. [sent-27, score-0.081]
16 The belief state at history h is b(S|h) = [prob(1|h) prob(2|h) . [sent-28, score-0.128]
17 prob(|S||h)], where prob(i|h) is the probability of the unobserved state being i at history h. [sent-31, score-0.112]
18 After taking an action a in history h and observing o, the belief state can be updated as follows: bT (S|hao) = bT (S|h)T a Oa,o (1|S| is the |S| × 1 vector consisting of all 1’s) bT (S|h)T a Oa,o 1|S| PSRs Littman et al. [sent-32, score-0.193]
19 [2] (inspired by the work of Rivest and Schapire [1] and Jaeger [5]) introduced PSRs to represent the state of a controlled dynamical system using predictions of the outcomes of tests. [sent-33, score-0.22]
20 They define a test t as a sequence of actions and observations t = a1 o1 a2 o2 · · · ak ok ; we shall call this type of test a sequence test, or s-test in short. [sent-34, score-0.43]
21 An s-test succeeds iff, when the sequence of actions a1 a2 · · · ak is executed, the system issues the observation sequence o1 o2 · · · ok . [sent-35, score-0.413]
22 The prediction p(t|h) is the probability that the s-test t succeeds from observed history h (of length n w. [sent-36, score-0.132]
23 , an+k = ak ) (1) where ai and oi denote the action taken and the observation, respectively, at time i. [sent-46, score-0.154]
24 In the rest of this paper, we will abbreviate expressions like the right-hand side of Equation 1 by prob(o1 o2 · · · ok |ha1 a2 · · · ak ). [sent-47, score-0.294]
25 q|Q| } is said to be a core set if it constitutes a PSR, i. [sent-51, score-0.138]
26 p(q|Q| |h)], is a sufficient statistic for any history h. [sent-56, score-0.091]
27 Equivalently, if Q is a core set, then for any s-test t, there exists a function ft such that p(t|h) = ft (p(Q|h)) for all h. [sent-57, score-0.153]
28 The prediction vector p(Q|h) in PSR models corresponds to belief state b(S|h) in POMDP models. [sent-58, score-0.094]
29 [2] are linear PSRs in the sense that for any s-test t, ft is a linear function of the predictions of the core s-tests; equivalently, the following equation ∀s-tests t ∃ a weight vector wt , s. [sent-60, score-0.29]
30 Upon taking action a in history h and observing o, the prediction vector can be updated as follows: p(qi |hao) = p(aoqi |h) faoqi (p(Q|h)) pT (Q|h)maoqi = = T p(ao|h) fao (p(Q|h)) p (Q|h)mao (3) where the final right-hand side is only valid for linear PSRs. [sent-63, score-0.166]
31 Thus a linear PSR model is specified by Q and by the weight vectors in the above equation maoq for all a ∈ A, o ∈ O, q ∈ Q ∪ φ (where φ is the null sequence). [sent-64, score-0.072]
32 It is pertinent to ask what sort of dynamical systems can be modeled by a PSR and how many core tests are required to model a system. [sent-65, score-0.287]
33 [2]) For any dynamical system that can be represented by a finite POMDP model, there exists a linear PSR model of size (|Q|) no more than the size (|S|) of the POMDP model. [sent-68, score-0.172]
34 The algorithm they present depends on the insight that s-tests are differentiated by their outcome vectors. [sent-71, score-0.075]
35 An outcome vector u(t) for an s-test t = a1 o1 a2 o2 . [sent-72, score-0.087]
36 ak ok is a |S| × 1 vector; the ith component of the vector is the probability of t succeeding given that the system is in the hidden state i, i. [sent-75, score-0.344]
37 Consider the matrix U whose rows correspond to the states in S and whose columns are the outcome vectors for all possible s-tests. [sent-81, score-0.175]
38 Let Q denote the set of s-tests associated with the maximal set of linearly independent columns of U ; clearly |Q| ≤ |S|. [sent-82, score-0.084]
39 showed that Q is a core set for a linear PSR model by the following logic. [sent-84, score-0.124]
40 We will reuse the concept of linear independence of outcome vectors with a new type of test to derive a PSR that is nonlinear in general. [sent-89, score-0.179]
41 In addition, this type of PSR in some cases requires a number of core tests that is exponentially smaller than the number of states in the minimal POMDP or the number of core tests in the linear PSR. [sent-91, score-0.471]
42 3 A new notion of tests In order to formulate a PSR that requires fewer core tests, we look to a new kind of test— the end test, or e-test in short. [sent-92, score-0.187]
43 An e-test e = a1 a2 · · · ak ok succeeds if, after the sequence of actions a1 a2 · · · ak is executed, ok is observed. [sent-94, score-0.641]
44 This type of test is inspired by Rivest and Schapire’s [1] notion of tests in their work on modeling deterministic dynamical systems. [sent-95, score-0.269]
45 The |S| × 1 outcome vector for an e-test e = a1 a2 . [sent-99, score-0.087]
46 (4) Note that we are using v’s to denote outcome vectors for e-tests and u’s to denote outcome vectors for s-tests. [sent-106, score-0.198]
47 Let QV denote the set of e-tests associated with a maximal set of linearly independent columns of matrix V ; clearly |QV | ≤ |S|. [sent-109, score-0.084]
48 The hope is that the set QV is a core set for an EPSR model of the dynamical system represented by the POMDP model. [sent-111, score-0.224]
49 We can compute the outcome vector for any e-test from the POMDP parameters using Equation 4. [sent-113, score-0.087]
50 So we could compute the columns of V one by one and check to see how many linearly independent columns we find. [sent-114, score-0.135]
51 Proof Computing the outcome vector for an e-test using Equation 4 is polynomial in the number of states and the length of the e-test. [sent-123, score-0.139]
52 Note that this algorithm is only practical if the outcome vectors have been found; this only makes sense if the POMDP model is already known, as outcome vectors map POMDP states to outcomes. [sent-127, score-0.223]
53 Next we show that the prediction of any e-test can be computed linearly from the prediction vector for the e-tests in QV . [sent-129, score-0.083]
54 Lemma 3 For any history h and any e-test e, the prediction p(e|h) is some linear function of prediction vector p(QV |h), i. [sent-130, score-0.148]
55 Furthermore, for any history h, p(e|h) = b(S|h)v(e) = b(S|h)V (QV )we = p(QV |h)we . [sent-135, score-0.077]
56 Note that Lemma 3 does not imply that QV constitutes a PSR, let alone a linear PSR, for the definition of a PSR requires that the prediction of all s-tests be computable from the core test predictions. [sent-136, score-0.194]
57 Theorem 1 If V (QV ), defined as above with respect to some POMDP model of a dynamical system, is a square matrix, i. [sent-138, score-0.103]
58 , the number of e-tests in QV is the number of states |S| (in that POMDP model), then QV constitutes a linear EPSR for that dynamical system. [sent-140, score-0.184]
59 Proof For any history h, pT (QV |h) = bT (S|h)V (QV ). [sent-141, score-0.077]
60 If V (QV ) is square then it is invertible because by construction it has full rank, and hence for any history h, bT (S|h) = pT (QV |h)V −1 (QV ). [sent-142, score-0.077]
61 For any s-test t = a1 o1 a2 o2 · · · ak ok , 1 1 pT (t|h) = bT (S|h)T a Oa T ,o1 2 2 T a Oa a1 −1 ,o2 k k · · · T a Oa a1 ,o1 a2 ,ok a2 ,o2 1S (by first-principles definition) k k k = p (QV |h)V (QV )T O T O · · · T a Oa ,o 1S = pT (QV |h)wt for some weight vector wt . [sent-143, score-0.348]
62 1 1 1 k k k We note that the product T a Oa ,o · · · T a Oa ,o 1S appears often in association with an s-test t = a1 o1 · · · ak ok , and abbreviate the product z(t). [sent-145, score-0.294]
63 We similarly define z(e) = k k 1 k T a T a2 · · · T a Oa ,o 1S for the e-test e = a1 a2 · · · ak ok . [sent-146, score-0.279]
64 (The form of the linear EPSR update equation is identical to the form of the update in linear PSRs with s-tests given in Equation 3). [sent-148, score-0.111]
65 Proof (Sketch) To prove this lemma, we must only find an example of a dynamical system that is an EPSR but not a linear EPSR. [sent-154, score-0.142]
66 We show in that section that the state space size is 2k and the number of s-tests in the core set of a linear s-test-based PSR model is also 2k , but the number of e-tests in QV is only k + 1. [sent-157, score-0.174]
67 4 A Nonlinear PSR for Deterministic Dynamical Systems In deterministic dynamical systems, the predictions of both e-tests and s-tests are binary and it is this basic fact that allows us to prove the following result. [sent-163, score-0.19]
68 Theorem 2 For deterministic dynamical systems the set of e-tests QV is always an EPSR and in general it is a nonlinear EPSR. [sent-164, score-0.194]
69 Proof For an arbitrary s-test t = a1 o1 a2 o2 · · · ak ok , and some arbitrary history h that is realizable (i. [sent-165, score-0.356]
70 In going from the third line to the fourth, we use the result of Lemma 3 that regardless of the size of QV , the predictions for all e-tests for any history h are linear functions of p(QV |h). [sent-171, score-0.149]
71 This shows that for deterministic dynamical systems, QV , regardless of its size, constitutes an EPSR. [sent-172, score-0.189]
72 1 Diversity and e-tests Rivest and Schapire’s [1] diversity representation, the inspiration for e-tests, applies only to deterministic systems and can be explained using the binary outcome matrix V defined at the beginning of Section 3. [sent-176, score-0.261]
73 Diversity also uses the predictions of a set of e-tests as its representation of state; it uses as many e-tests as there are distinct columns in the matrix V . [sent-178, score-0.136]
74 Clearly, at most there can be 2|S| distinct columns and they show that there have to be at least log2 (|S|) distinct columns and that these bounds are tight. [sent-179, score-0.136]
75 Thus the size of the diversity representation can be exponentially smaller or exponentially bigger than the size of a POMDP representation. [sent-180, score-0.293]
76 While we use the same notion of tests as the diversity representation, our use of linear independence of outcome vectors instead of equivalence classes based on equality of outcome vectors allows us to use e-tests on stochastic systems. [sent-181, score-0.419]
77 Next we show through an example that EPSR models in deterministic dynamic systems can lead to exponential compression over POMDP models in some cases while avoiding the exponential blowup possible in Rivest and Schapire’s [1] diversity representation. [sent-182, score-0.202]
78 2 EPSRs can be Exponentially Smaller than Diversity This first example shows a case in which the size of the EPSR representation is exponentially smaller than the size of the diversity representation. [sent-184, score-0.23]
79 The hit register (see Figure 2a) is a k-bit register (these are the value bits) with an additional special hit bit. [sent-185, score-0.296]
80 There are 2k + 1 states in the POMDP describing this system—one state in which the hit bit is 1 and 2k states in which the hit bit is 0 and the value bits take on different combinations of a) k bits (value bits) 0 . [sent-186, score-0.469]
81 The value of the hit bit determines the observation and k + 2 actions alter the value of the bits; this is fully specified in Section 4. [sent-195, score-0.204]
82 The value of the leftmost bit is observed; this bit can be flipped, and the register can be rotated either to the right or to the left. [sent-198, score-0.209]
83 There are k + 2 actions: a flip action Fi for each value bit i that inverts bit i if the hit bit is not set, a set action Sh that sets the hit bit if all the value bits are 0, and a clear action Ch that clears the hit bit. [sent-202, score-0.667]
84 There are two observations: Oh if the hit bit is set and Om otherwise. [sent-203, score-0.151]
85 The diversity representation requires O(22 ) equivalence classes and thus tests, k whereas an EPSR has only 2 + 1 core e-tests (see Table 1 for the core e-tests and update function when k = 2). [sent-205, score-0.367]
86 Table 1: Core e-tests and update functions for the 2-bit hit register problem. [sent-206, score-0.175]
87 The number of e-tests used by diversity representation is the number of distinct columns in the binary V matrix of Section 3. [sent-209, score-0.202]
88 1, while the number of e-tests used by the EPSR representation is the number of linearly independent columns in V . [sent-210, score-0.116]
89 As a quick example, consider the case of 2-bit binary vectors: There are 4 distinct vectors but only 2 linearly independent ones. [sent-212, score-0.074]
90 3 EPSRs can be Exponentially Smaller than POMDPs and the Original PSRs This second example shows a case in which the EPSR representation uses exponentially fewer tests than the number of states in the POMDP representation as well as the original linear PSR representation. [sent-214, score-0.229]
91 The rotate register illustrated in Figure 2b is a k-bit shift-register. [sent-215, score-0.092]
92 Table 2: Core e-tests and update function for the 4 bit rotate register problem. [sent-216, score-0.181]
93 The three actions are R, which shifts the register one place to the right with wraparound, L, which does the opposite, and F , which flips the leftmost bit. [sent-218, score-0.125]
94 This problem is also presented by Rivest and Schapire as an example of a system whose diversity is exponentially smaller than the number of states in the minimal POMDP, which is 2k . [sent-219, score-0.211]
95 This is also the number of core s-tests in the equivalent linear PSR (we computed these 2k s-tests but do not report them here). [sent-220, score-0.124]
96 However, the EPSR that models this system has only k + 1 core e-tests. [sent-222, score-0.133]
97 The tests and update function for the 4-bit rotate register are shown in Table 2. [sent-223, score-0.188]
98 5 Conclusions and Future Work In this paper we have used a new type of test, the e-test, to specify a nonlinear PSR for deterministic controlled dynamical systems. [sent-224, score-0.225]
99 We proved that in some deterministic systems our new PSR models are exponentially smaller than both the original PSR models as well as POMDP models. [sent-226, score-0.153]
100 Similarly, compared to the size of Rivest & Schapire’s diversity representation (the inspiration for the notion of e-tests) we proved that our PSR models are never bigger but sometimes exponentially smaller. [sent-227, score-0.26]
wordName wordTfidf (topN-words)
[('qv', 0.672), ('psr', 0.351), ('oh', 0.259), ('sh', 0.213), ('pomdp', 0.189), ('epsr', 0.186), ('psrs', 0.186), ('ok', 0.162), ('ak', 0.117), ('core', 0.103), ('dynamical', 0.103), ('diversity', 0.102), ('prob', 0.099), ('oa', 0.099), ('rivest', 0.093), ('hit', 0.089), ('history', 0.077), ('outcome', 0.075), ('tests', 0.069), ('littman', 0.066), ('bit', 0.062), ('ao', 0.062), ('register', 0.059), ('bt', 0.052), ('deterministic', 0.051), ('columns', 0.051), ('exponentially', 0.05), ('pomdps', 0.046), ('pt', 0.045), ('wt', 0.045), ('schapire', 0.042), ('bits', 0.041), ('aoei', 0.041), ('epsrs', 0.041), ('hao', 0.041), ('mao', 0.041), ('actions', 0.04), ('action', 0.037), ('predictions', 0.036), ('satinder', 0.036), ('state', 0.035), ('constitutes', 0.035), ('lemma', 0.035), ('linearly', 0.033), ('rotate', 0.033), ('representation', 0.032), ('aei', 0.031), ('nonlinear', 0.028), ('controlled', 0.028), ('update', 0.027), ('ch', 0.027), ('leftmost', 0.026), ('states', 0.025), ('ft', 0.025), ('vectors', 0.024), ('observations', 0.024), ('predictive', 0.023), ('succeeds', 0.023), ('linear', 0.021), ('inspiration', 0.021), ('rudary', 0.021), ('executed', 0.02), ('singh', 0.02), ('sequence', 0.02), ('prediction', 0.019), ('observable', 0.019), ('system', 0.018), ('ei', 0.017), ('distinct', 0.017), ('smaller', 0.016), ('et', 0.016), ('test', 0.016), ('belief', 0.016), ('size', 0.015), ('abbreviate', 0.015), ('ha', 0.015), ('submatrix', 0.015), ('future', 0.015), ('type', 0.015), ('notion', 0.015), ('equation', 0.015), ('proof', 0.015), ('date', 0.014), ('polynomial', 0.014), ('equality', 0.014), ('statistic', 0.014), ('compression', 0.013), ('length', 0.013), ('equivalently', 0.013), ('bigger', 0.013), ('ever', 0.013), ('stops', 0.013), ('observation', 0.013), ('extensions', 0.013), ('vector', 0.012), ('rank', 0.012), ('systems', 0.012), ('implications', 0.012), ('models', 0.012), ('weight', 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 14 nips-2003-A Nonlinear Predictive State Representation
Author: Matthew R. Rudary, Satinder P. Singh
Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1
2 0.10905416 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun
Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1
3 0.099001445 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
Author: Felix A. Wichmann, Arnulf B. Graf
Abstract: We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and shape representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1
4 0.073545024 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
Author: Georgios Theocharous, Leslie P. Kaelbling
Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.
5 0.062754281 144 nips-2003-One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals
Author: Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita
Abstract: Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long. 1
6 0.040307201 148 nips-2003-Online Passive-Aggressive Algorithms
7 0.03815366 158 nips-2003-Policy Search by Dynamic Programming
8 0.034982394 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
9 0.033153377 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
10 0.031883743 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds
11 0.03151847 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
12 0.031089885 145 nips-2003-Online Classification on a Budget
13 0.027502187 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
14 0.027060922 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
15 0.026719945 42 nips-2003-Bounded Finite State Controllers
16 0.026557116 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
17 0.024529871 62 nips-2003-Envelope-based Planning in Relational MDPs
18 0.024459038 78 nips-2003-Gaussian Processes in Reinforcement Learning
19 0.024314081 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
20 0.02383681 68 nips-2003-Eye Movements for Reward Maximization
topicId topicWeight
[(0, -0.081), (1, 0.06), (2, -0.03), (3, -0.016), (4, -0.008), (5, -0.021), (6, -0.003), (7, 0.02), (8, 0.06), (9, 0.057), (10, 0.052), (11, -0.046), (12, 0.028), (13, 0.032), (14, 0.007), (15, -0.013), (16, 0.039), (17, 0.011), (18, 0.089), (19, -0.016), (20, 0.039), (21, 0.05), (22, 0.06), (23, -0.102), (24, 0.101), (25, 0.073), (26, -0.075), (27, -0.052), (28, 0.092), (29, 0.081), (30, 0.009), (31, 0.053), (32, -0.039), (33, -0.029), (34, -0.032), (35, -0.031), (36, 0.164), (37, 0.021), (38, 0.013), (39, 0.085), (40, 0.03), (41, 0.123), (42, -0.232), (43, -0.014), (44, -0.012), (45, 0.151), (46, 0.023), (47, -0.046), (48, 0.107), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.93567556 14 nips-2003-A Nonlinear Predictive State Representation
Author: Matthew R. Rudary, Satinder P. Singh
Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1
2 0.50566244 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun
Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1
3 0.46485552 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
Author: Felix A. Wichmann, Arnulf B. Graf
Abstract: We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and shape representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1
4 0.45215878 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
Author: Georgios Theocharous, Leslie P. Kaelbling
Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.
5 0.41445136 144 nips-2003-One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals
Author: Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita
Abstract: Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long. 1
6 0.35436538 42 nips-2003-Bounded Finite State Controllers
7 0.27207145 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
8 0.25387889 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
9 0.23624821 44 nips-2003-Can We Learn to Beat the Best Stock
10 0.227916 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
12 0.21181728 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons
13 0.20700999 137 nips-2003-No Unbiased Estimator of the Variance of K-Fold Cross-Validation
14 0.20609245 145 nips-2003-Online Classification on a Budget
15 0.20369072 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
16 0.20353806 118 nips-2003-Link Prediction in Relational Data
17 0.20080067 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
18 0.19814672 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?
19 0.19812538 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
20 0.19330522 148 nips-2003-Online Passive-Aggressive Algorithms
topicId topicWeight
[(0, 0.054), (11, 0.026), (29, 0.013), (30, 0.017), (35, 0.039), (53, 0.056), (71, 0.042), (74, 0.411), (76, 0.021), (85, 0.072), (91, 0.121)]
simIndex simValue paperId paperTitle
same-paper 1 0.77139151 14 nips-2003-A Nonlinear Predictive State Representation
Author: Matthew R. Rudary, Satinder P. Singh
Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1
2 0.76340079 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?
Author: Daniela Pucci de Farias, Nimrod Megiddo
Abstract: The so-called “experts algorithms” constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies (“experts”), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player’s actions on the opponent’s actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner’s Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play. 1
3 0.76216483 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons
Author: Hsin Chen, Patrice Fleury, Alan F. Murray
Abstract: This paper presents VLSI circuits with continuous-valued probabilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Continuous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomedical data. The Minimising-Contrastive-Divergence learning algorithm for CRBM is also implemented in mixed-mode VLSI, to adapt the noisy neurons’ parameters on-chip. 1
4 0.56244951 81 nips-2003-Geometric Analysis of Constrained Curves
Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen
Abstract: We present a geometric approach to statistical shape analysis of closed curves in images. The basic idea is to specify a space of closed curves satisfying given constraints, and exploit the differential geometry of this space to solve optimization and inference problems. We demonstrate this approach by: (i) defining and computing statistics of observed shapes, (ii) defining and learning a parametric probability model on shape space, and (iii) designing a binary hypothesis test on this space. 1
5 0.37180775 146 nips-2003-Online Learning of Non-stationary Sequences
Author: Claire Monteleoni, Tommi S. Jaakkola
Abstract: We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization for learning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.
6 0.37044409 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
7 0.36915019 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
8 0.36577916 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
9 0.36374664 68 nips-2003-Eye Movements for Reward Maximization
10 0.36122894 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
11 0.36103824 78 nips-2003-Gaussian Processes in Reinforcement Learning
12 0.35847193 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
13 0.35665357 113 nips-2003-Learning with Local and Global Consistency
14 0.35664266 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
15 0.35539147 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
16 0.35410094 55 nips-2003-Distributed Optimization in Adaptive Networks
17 0.35402253 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
18 0.35344893 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
19 0.35254866 107 nips-2003-Learning Spectral Clustering
20 0.35232544 143 nips-2003-On the Dynamics of Boosting