nips nips2004 nips2004-154 knowledge-graph by maker-knowledge-mining

154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors


Source: pdf

Author: Guy Shani, Ronen I. Brafman

Abstract: Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i.e., different states that appear similar but require different responses. This problem is exacerbated when the agent’s sensors are noisy, i.e., sensors may produce different observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ?Ĺš x Memory, Ä?Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile SufÄ?Ĺš x Memory (NUSM), based on USM, that uses a weighted classiÄ?Ĺš cation of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i. [sent-5, score-0.496]

2 This problem is exacerbated when the agent’s sensors are noisy, i. [sent-8, score-0.193]

3 , sensors may produce different observations in the same state. [sent-10, score-0.202]

4 We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ? [sent-11, score-0.198]

5 Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. [sent-13, score-0.67]

6 1 Introduction Consider an agent situated in a partially observable domain: It executes an action; the action may change the state of the world; this change is reÄ? [sent-18, score-0.524]

7 Ĺš‚ected, in turn, by the agent’s sensors; the action may have some associated cost, and the new state may have some associated reward or penalty. [sent-19, score-0.236]

8 In this paper we are interested in agents with imperfect and noisy sensors that learn to act in such environments without any prior information about the underlying set of world-states and the world’s dynamics, only information about their sensor’s capabilities. [sent-21, score-0.376]

9 This is a known variant of reinforcement learning (RL) in partially observable domains [1]. [sent-22, score-0.266]

10 As the agent works with observations, rather than states, two possible problems arise: The agent may observe too much data which requires computationally intensive Ä? [sent-23, score-0.412]

11 This leads to a phenomena known as perceptual aliasing [2], where the same observation is obtained in distinct states requiring different actions. [sent-27, score-0.416]

12 For example, in Figure 1(a) the states marked with X are perceptually aliased. [sent-28, score-0.124]

13 Ĺš guration is sensed, states marked with the same letter (X or Y) are perceptually aliased. [sent-33, score-0.124]

14 The problem of resolving the perceptual aliasing is exacerbated when the agent’s sensors are not deterministic. [sent-34, score-0.586]

15 The performance of existing techniques for handling RL domains with perceptually aliased states, such as Ä? [sent-36, score-0.214]

16 Ĺš nite size history windows, eligibility traces, and internal memory quickly degrades as the level of noise increases. [sent-37, score-0.467]

17 We compare the perx formance of MUSM to USM and other existing methods on the above two standard maze domains, and show that it is more robust to noise. [sent-41, score-0.164]

18 Ĺš‚y review a number of algorithms for resolving the problem of perceptual aliasing. [sent-43, score-0.205]

19 We assume familiarity with basic RL techniques, and in particular, Q-learning, SARSA, and eligibility traces (see Sutton and Barto [12] for more details). [sent-44, score-0.176]

20 The simplest way to handle perceptual aliasing is to ignore it by using the observation space as the state space, deÄ? [sent-45, score-0.472]

21 Loch and Singh [6] explore problems where a memory-less optimal policy exists and demonstrated that Sarsa(ĂŽĹĽ) can learn an optimal policy for such domains. [sent-52, score-0.169]

22 Finite-size history methods are a natural extension to memory-less policies. [sent-53, score-0.14]

23 Ĺš history window to learn a good policy in domains xed where short-term memory optimal policies exist. [sent-58, score-0.578]

24 Ĺš ned history window cannot generally solve perceptual aliasing: an agent cannot be expected to know in advance how long should it remember the actionobservation-reward trajectories. [sent-60, score-0.613]

25 Usually, some areas of the state space require the agent to remember more, while in other locations a reactive policy is sufÄ? [sent-61, score-0.399]

26 A better solution is to learn online the history length needed to decide on the best action in the current location. [sent-63, score-0.298]

27 x Another possible approach to handle perceptual aliasing is to augment the observations with some internal memory [10]. [sent-67, score-0.663]

28 The agents’ state s is composed of both the current observation o and the internal memory state m. [sent-68, score-0.358]

29 The agents’ actions are enhanced with actions that modify the memory state by Ä? [sent-69, score-0.325]

30 The agent uses some standard learning mechanism to learn the proper action, including actions that change the memory state. [sent-71, score-0.477]

31 Ĺš nite or variable history length because meaningful events can occur arbitrarily far in the past. [sent-75, score-0.14]

32 Ĺš nite-state automata (FSA) [8] (which can be viewed as a special case of the memory-bits approach); the use of neural networks for internal memory [5, 3]; and constructing and solving a POMDP model [7, 2]. [sent-80, score-0.208]

33 Ĺš cation [7] resolves perceptual aliasing with variable length short term memory. [sent-85, score-0.335]

34 A node holds all the instances Tt = Tt−1 , at−1 , ot , rt whose Ä? [sent-98, score-0.227]

35 Ś observation ot matches the nal observation in the node’s label. [sent-99, score-0.155]

36 At the next level, instances are split based on the last action of the instance at . [sent-100, score-0.279]

37 All nodes act as buckets, grouping together instances that have matching history sufÄ? [sent-102, score-0.265]

38 The deeper a leaf is in the tree, the more history the instances in this leaf share. [sent-105, score-0.499]

39 To add a new instance to the tree, we examine its precept, and follow the path to the child node labeled by that precept. [sent-107, score-0.159]

40 We then look at the action before this precept and move to the node labeled by that action, then branch on the precept prior to that action and so forth, until a leaf is reached. [sent-108, score-0.521]

41 Identifying the proper depth for a certain leaf is a major issue, and we shall present a number of improvements to McCallum’s methods. [sent-109, score-0.201]

42 Leaves should be split if their descendants show a statistical difference in expected future discounted reward associated with the same action. [sent-110, score-0.177]

43 We split instances in a node if knowing where the agent came from helps predict future discounted rewards. [sent-111, score-0.497]

44 For better performance, McCallum did not compare the nodes in the fringes to their siblings, only to their ancestor � [sent-118, score-0.121]

45 He also did not compare values from all actions executed from the fringe, only the action that has the highest Q-value in the leaf (the policy action of that leaf). [sent-122, score-0.471]

46 To compare the populations of expected discounted future rewards from the two nodes (the fringe and the � [sent-123, score-0.302]

47 If the test reported that a difference was found between the expected discounted future rewards after executing the policy action, the leaf was split, the fringe node would become the new leaf, and the tree would be expanded to create deeper fringes. [sent-128, score-0.642]

48 Figure 3 presents an example of a possible USM tree, without fringe nodes. [sent-129, score-0.164]

49 Below is a x sequence of instances demonstrating how some instances are clustered into the tree leaves. [sent-132, score-0.283]

50 Instead of comparing the fringe node to its ancestor � [sent-133, score-0.265]

51 McCallum compared only the expected discounted future rewards from executing the policy action, where we compare all the values following all actions executed after any of the instances in the fringe. [sent-137, score-0.324]

52 McCallum also considered only fringe nodes of certain depth, given as a parameter to the algorithm, where we choose to create fringe nodes as deep as possible, until the number of instances in the node diminish below some threshold (we use a value of 10 in our experiments). [sent-139, score-0.556]

53 The expected future discounted reward of instance Ti is deÄ? [sent-140, score-0.193]

54 Ĺš by: ned Q(Ti ) = ri + ĂŽĹ‚U (L(Ti+1 )) (1) where L(Ti ) is the leaf associated with instance Ti and U (s) = maxa (Q(s, a)). [sent-141, score-0.184]

55 Ĺš guration for a problem the leaves of the tree deÄ? [sent-143, score-0.152]

56 Now that the Q-values have been updated, the agent chooses the next action to perform based on the Q-values in the leaf corresponding to the current instance Tt : at+1 = argmaxa Q(L(Tt ), a) (5) McCallum uses the fringes of the tree for a smart exploration strategy. [sent-146, score-0.65]

57 The system makes different observations at the same location (false negative), or it makes identical observations at different locations (false positive). [sent-150, score-0.124]

58 Knowing that the agent came from different locations, helps it realize that it is in two different locations, though the observations look the same. [sent-152, score-0.284]

59 When the agent is at the same state, but receives different observations, it is unable to learn from the noisy observation and thus wastes much of the available information. [sent-154, score-0.371]

60 It is reasonable to assume that the agent has some sensor model deÄ? [sent-157, score-0.285]

61 Ś ning pr(o|s) — the probability that the agent will observe o in world state s. [sent-158, score-0.318]

62 We can use the sensor model to augment USM with the ability to learn from noisy instances. [sent-159, score-0.273]

63 In our experiments we assume the agent has n boolean sensors with an accuracy probability pi . [sent-160, score-0.363]

64 , ÄŽ‰1 came from an actual world state s = ÄŽ‰1 , . [sent-168, score-0.145]

65 Using any sensor model we can insert the new instance Tt = Tt−1 , at−1 , ot , rt into several paths with different weights. [sent-172, score-0.231]

66 When inserting Tt with weight w into an action node at depth k (with its children labeled by observations) we will insert the instance into every child node c, with weight w ¡ pr(ot−k−1 |label(c)). [sent-173, score-0.453]

67 When inserting Tt with weight w into an observation node at depth k (with its children labeled by actions) we will insert the instance only into the child c labeled by at−k−1 with the same weight w. [sent-174, score-0.324]

68 Weights of instances are stored in each node with the instance as ws (Tt ) — the weight of instance Tt in node s. [sent-175, score-0.393]

69 Conducted experiments indicate that using the noisy sequences for deciding when to split leaves provides a slight gain in collected rewards, but the constructed tree is much larger, resulting in a considerable hit to performance. [sent-179, score-0.313]

70 When a state corresponding to a noisy sequence is observed, even though the noise in it might make it rare, NUSM can still use data from real sequences to decide which action is useful for this state. [sent-181, score-0.316]

71 5 Experimental Results To test our algorithms we used two maze worlds, seen in Figure 1(a) and Figure 1(b), identical to the worlds McCallum used to show the performance of the USM algorithm. [sent-182, score-0.191]

72 In both cases some of the world states are perceptually aliased and the algorithm should learn to identify the real world states. [sent-183, score-0.344]

73 The agent in our experiments has four sensors allowing it to sense an immediate wall above, below, to the left, and to the right of its current location. [sent-184, score-0.417]

74 The probability of all sensors providing the correct output is ĂŽÄ…4 . [sent-186, score-0.184]

75 In both mazes there is a single location that grants the agent a reward of 10. [sent-187, score-0.285]

76 Upon receiving that reward the agent is transformed to any of the perceptually aliased states of the maze randomly. [sent-188, score-0.646]

77 If the agent bumps into a wall it pays a cost (a negative reward) of 1. [sent-189, score-0.292]

78 Ĺš nite size history windows to Q-learning and Sarsa, eligibility traces, memory bits, USM and NUSM on the two worlds. [sent-193, score-0.418]

79 In the tables below, QL2 denotes using Q-learning with a history window of size 2, and S2 denotes using Sarsa with a window size of 2. [sent-194, score-0.308]

80 For example, S(ĂŽĹĽ)1 denotes using Sarsa(ĂŽĹĽ) with a 2 history window of 2 and 1 internal memory bit. [sent-198, score-0.432]

81 We ran each algorithm for 50000 steps learning a policy as explained above, and calculated the average reward over the last 5000 iterations only to avoid the difference in convergence time. [sent-202, score-0.175]

82 We also ran experiments for Sarsa and Q-learning with only the immediate observation, which yielded poor results as expected, and for history window of size 3 and 4 which resulted in lower performance than history window of size 2 for all algorithms (and were therefore removed from the tables). [sent-208, score-0.501]

83 018 Table 1: Average reward as function of sensor accuracy, for the maze in Figure 1(a). [sent-397, score-0.322]

84 008 Table 2: Average reward as function of sensor accuracy, for the maze in Figure 1(b). [sent-572, score-0.322]

85  4/   6/0% $YHUDJH UHZDUG 6/    860  1860          ,QV WDQFHV Figure 3: Convergence rates for the maze in Figure 1(a) when sensor accuracy is 0. [sent-573, score-0.243]

86 This is because when sensors supply perfect output, resolving the perceptual aliasing results in a fully observable environment which Q-learning and Sarsa can solve optimally. [sent-576, score-0.656]

87 The only algorithm that competes with NUSM is Sarsa(ĂŽĹĽ) with a history window of 2. [sent-579, score-0.224]

88 The ability of Sarsa(ĂŽĹĽ) to perform well in partially observable domains have been noted by Lock and Singh [6]1 , but we note here that the performance of Sarsa(ĂŽĹĽ) relies heavily on the proper deÄ? [sent-580, score-0.282]

89 When the history window differs slightly from the required size, the performance hit is substantial, as we can see in the two adjacent columns. [sent-582, score-0.224]

90 1 milliseconds with the same 1 Lock and Singh also recommend the use of replacing traces but we found that using accumulating traces produced better performance. [sent-587, score-0.205]

91 85 and a similar number of nodes (10, 229 for NUSM and 10, 349 for USM — including fringe nodes), making NUSM reasonable for online learning. [sent-589, score-0.223]

92 Finally, Both USM and NUSM attempt to disambiguate the perceptual aliasing and create a fully observable MDP. [sent-590, score-0.441]

93 Yet, it is better to model the world directly as partially observable using a Partially Observable Markov Decision Process (POMDP). [sent-591, score-0.22]

94 POMDP policies explicitly address the problem of incomplete knowledge, taking into account the ability of certain actions to reduce uncertainty without immediately generating useful rewards. [sent-592, score-0.144]

95 Ĺš nite size history windows, memory bits and USM, that resolve perceptual aliasing, provide poor results in the presence of noisy sensors. [sent-598, score-0.598]

96 As noise arises, NUSM works better than other algorithms used for handling domains with perceptual aliasing. [sent-600, score-0.233]

97 Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. [sent-612, score-0.294]

98 Reinforcement learning algorithm for partially observable Markov decision problems. [sent-625, score-0.161]

99 Ĺš the best memoryless policy in partially nd observable Markov decision processes. [sent-638, score-0.263]

100 Experimental results on learning stochastic memoryless policies for partially observable markov decision processes. [sent-684, score-0.24]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('usm', 0.438), ('nusm', 0.401), ('agent', 0.206), ('sarsa', 0.206), ('aliasing', 0.188), ('fringe', 0.164), ('maze', 0.164), ('utile', 0.164), ('sensors', 0.157), ('memory', 0.154), ('perceptual', 0.147), ('tt', 0.145), ('history', 0.14), ('ti', 0.136), ('leaf', 0.134), ('observable', 0.106), ('action', 0.104), ('tree', 0.101), ('noisy', 0.092), ('instances', 0.091), ('mccallum', 0.09), ('traces', 0.089), ('perceptually', 0.087), ('eligibility', 0.087), ('window', 0.084), ('sensor', 0.079), ('reward', 0.079), ('aliased', 0.073), ('singh', 0.073), ('policy', 0.07), ('node', 0.069), ('ot', 0.067), ('discounted', 0.064), ('ws', 0.064), ('rl', 0.061), ('world', 0.059), ('actions', 0.059), ('resolving', 0.058), ('agents', 0.055), ('partially', 0.055), ('fringes', 0.055), ('lock', 0.055), ('mccallums', 0.055), ('precept', 0.055), ('wall', 0.054), ('domains', 0.054), ('internal', 0.054), ('state', 0.053), ('leaves', 0.051), ('reinforcement', 0.051), ('instance', 0.05), ('pomdp', 0.048), ('inserting', 0.048), ('peshkin', 0.048), ('policies', 0.047), ('observations', 0.045), ('suf', 0.045), ('observation', 0.044), ('environments', 0.043), ('child', 0.04), ('rewards', 0.04), ('handle', 0.04), ('ability', 0.038), ('depth', 0.038), ('bits', 0.038), ('windows', 0.037), ('states', 0.037), ('exacerbated', 0.036), ('loch', 0.036), ('musm', 0.036), ('nsm', 0.036), ('siblings', 0.036), ('remember', 0.036), ('sequences', 0.035), ('augment', 0.035), ('insert', 0.035), ('split', 0.034), ('locations', 0.034), ('nodes', 0.034), ('came', 0.033), ('handles', 0.032), ('pays', 0.032), ('memoryless', 0.032), ('ancestor', 0.032), ('meuleau', 0.032), ('noise', 0.032), ('trajectories', 0.03), ('false', 0.029), ('brafman', 0.029), ('ks', 0.029), ('proper', 0.029), ('learn', 0.029), ('poor', 0.027), ('output', 0.027), ('worlds', 0.027), ('milliseconds', 0.027), ('ran', 0.026), ('xes', 0.025), ('online', 0.025), ('pr', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors

Author: Guy Shani, Ronen I. Brafman

Abstract: Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i.e., different states that appear similar but require different responses. This problem is exacerbated when the agent’s sensors are noisy, i.e., sensors may produce different observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ?Ĺš x Memory, Ä?Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile SufÄ?Ĺš x Memory (NUSM), based on USM, that uses a weighted classiÄ?Ĺš cation of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.

2 0.15160221 88 nips-2004-Intrinsically Motivated Reinforcement Learning

Author: Nuttapong Chentanez, Andrew G. Barto, Satinder P. Singh

Abstract: Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. 1

3 0.14941598 64 nips-2004-Experts in a Markov Decision Process

Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour

Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1

4 0.14868015 24 nips-2004-Approximately Efficient Online Mechanism Design

Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh

Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1

5 0.10941446 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

Author: Liam Paninski

Abstract: Log-concavity is an important property in the context of optimization, Laplace approximation, and sampling; Bayesian methods based on Gaussian process priors have become quite popular recently for classification, regression, density estimation, and point process intensity estimation. Here we prove that the predictive densities corresponding to each of these applications are log-concave, given any observed data. We also prove that the likelihood is log-concave in the hyperparameters controlling the mean function of the Gaussian prior in the density and point process intensity estimation cases, and the mean, covariance, and observation noise parameters in the classification and regression cases; this result leads to a useful parameterization of these hyperparameters, indicating a suitably large class of priors for which the corresponding maximum a posteriori problem is log-concave.

6 0.1030542 39 nips-2004-Coarticulation in Markov Decision Processes

7 0.10152087 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

8 0.097349316 123 nips-2004-Multi-agent Cooperation in Diverse Population Games

9 0.086177178 183 nips-2004-Temporal-Difference Networks

10 0.075056009 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models

11 0.068842039 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems

12 0.066988729 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments

13 0.065516353 136 nips-2004-On Semi-Supervised Classification

14 0.062009577 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

15 0.056218915 148 nips-2004-Probabilistic Computation in Spiking Populations

16 0.056214061 33 nips-2004-Brain Inspired Reinforcement Learning

17 0.055399962 155 nips-2004-Responding to Modalities with Different Latencies

18 0.054911006 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity

19 0.053075645 102 nips-2004-Learning first-order Markov models for control

20 0.052786481 48 nips-2004-Convergence and No-Regret in Multiagent Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.144), (1, -0.047), (2, 0.244), (3, -0.065), (4, -0.127), (5, 0.173), (6, -0.02), (7, -0.049), (8, -0.068), (9, -0.008), (10, 0.022), (11, 0.029), (12, -0.012), (13, 0.015), (14, -0.018), (15, -0.002), (16, 0.013), (17, -0.066), (18, 0.051), (19, -0.007), (20, -0.047), (21, 0.073), (22, -0.061), (23, 0.008), (24, -0.048), (25, 0.013), (26, 0.046), (27, 0.04), (28, 0.085), (29, -0.085), (30, -0.063), (31, -0.098), (32, -0.031), (33, 0.064), (34, 0.045), (35, 0.096), (36, 0.129), (37, -0.122), (38, 0.016), (39, -0.105), (40, -0.037), (41, -0.11), (42, 0.014), (43, 0.04), (44, -0.078), (45, 0.096), (46, -0.084), (47, -0.157), (48, 0.043), (49, -0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96640253 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors

Author: Guy Shani, Ronen I. Brafman

Abstract: Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i.e., different states that appear similar but require different responses. This problem is exacerbated when the agent’s sensors are noisy, i.e., sensors may produce different observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ?Ĺš x Memory, Ä?Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile SufÄ?Ĺš x Memory (NUSM), based on USM, that uses a weighted classiÄ?Ĺš cation of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.

2 0.67845649 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models

Author: Michael P. Holmes, Charles Jr.

Abstract: Schema learning is a way to discover probabilistic, constructivist, predictive action models (schemas) from experience. It includes methods for finding and using hidden state to make predictions more accurate. We extend the original schema mechanism [1] to handle arbitrary discrete-valued sensors, improve the original learning criteria to handle POMDP domains, and better maintain hidden state by using schema predictions. These extensions show large improvement over the original schema mechanism in several rewardless POMDPs, and achieve very low prediction error in a difficult speech modeling task. Further, we compare extended schema learning to the recently introduced predictive state representations [2], and find their predictions of next-step action effects to be approximately equal in accuracy. This work lays the foundation for a schema-based system of integrated learning and planning. 1

3 0.59227008 24 nips-2004-Approximately Efficient Online Mechanism Design

Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh

Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1

4 0.53863949 39 nips-2004-Coarticulation in Markov Decision Processes

Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen

Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1

5 0.52899355 64 nips-2004-Experts in a Markov Decision Process

Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour

Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1

6 0.50585818 88 nips-2004-Intrinsically Motivated Reinforcement Learning

7 0.45952469 183 nips-2004-Temporal-Difference Networks

8 0.44662318 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

9 0.44313166 123 nips-2004-Multi-agent Cooperation in Diverse Population Games

10 0.43707314 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity

11 0.42515078 155 nips-2004-Responding to Modalities with Different Latencies

12 0.37965038 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

13 0.34485659 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems

14 0.34454256 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments

15 0.33385962 33 nips-2004-Brain Inspired Reinforcement Learning

16 0.32485914 180 nips-2004-Synchronization of neural networks by mutual learning and its application to cryptography

17 0.29029372 193 nips-2004-Theories of Access Consciousness

18 0.26720846 130 nips-2004-Newscast EM

19 0.26487854 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

20 0.26182637 35 nips-2004-Chemosensory Processing in a Spiking Model of the Olfactory Bulb: Chemotopic Convergence and Center Surround Inhibition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.082), (15, 0.117), (17, 0.015), (26, 0.036), (31, 0.032), (32, 0.014), (33, 0.161), (35, 0.024), (50, 0.059), (52, 0.018), (58, 0.016), (66, 0.25), (71, 0.016), (81, 0.013), (82, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83213705 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors

Author: Guy Shani, Ronen I. Brafman

Abstract: Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i.e., different states that appear similar but require different responses. This problem is exacerbated when the agent’s sensors are noisy, i.e., sensors may produce different observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ?Ĺš x Memory, Ä?Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile SufÄ?Ĺš x Memory (NUSM), based on USM, that uses a weighted classiÄ?Ĺš cation of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.

2 0.8013097 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis

Author: Matthias O. Franz, Bernhard Schölkopf

Abstract: The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images. Most of the interesting structure in a natural image is characterized by its higher-order statistics. Arbitrarily oriented lines and edges, for instance, cannot be described by the usual pairwise statistics such as the power spectrum or the autocorrelation function: From knowing the intensity of one point on a line alone, we cannot predict its neighbouring intensities. This would require knowledge of a second point on the line, i.e., we have to consider some third-order statistics which describe the interactions between triplets of points. Analogously, the prediction of a corner neighbourhood needs at least fourth-order statistics, and so on. In terms of Fourier analysis, higher-order image structures such as edges or corners are described by phase alignments, i.e. phase correlations between several Fourier components of the image. Classically, harmonic phase interactions are measured by higher-order spectra [4]. Unfortunately, the estimation of these spectra for high-dimensional signals such as images involves the estimation and interpretation of a huge number of terms. For instance, a sixth-order spectrum of a 16×16 sized image contains roughly 1012 coefficients, about 1010 of which would have to be estimated independently if all symmetries in the spectrum are considered. First attempts at estimating the higher-order structure of natural images were therefore restricted to global measures such as skewness or kurtosis [8], or to submanifolds of fourth-order spectra [9]. Here, we propose an alternative approach that models the interactions of image points in a series of Wiener functionals. A Wiener functional of order n captures those image components that can be predicted from the multiplicative interaction of n image points. In contrast to higher-order spectra or moments, the estimation of a Wiener model does not require the estimation of an excessive number of terms since it can be computed implicitly via polynomial kernels. This allows us to decompose an image into components that are characterized by interactions of a given order. In the next section, we introduce the Wiener expansion and discuss its capability of modeling higher-order pixel interactions. The implicit estimation method is described in Sect. 2, followed by some examples of use in Sect. 3. We conclude in Sect. 4 by briefly discussing the results and possible improvements. 1 Modeling pixel interactions with Wiener functionals For our analysis, we adopt a prediction framework: Given a d × d neighbourhood of an image pixel, we want to predict its gray value from the gray values of the neighbours. We are particularly interested to which extent interactions of different orders contribute to the overall prediction. Our basic assumption is that the dependency of the central pixel value y on its neighbours xi , i = 1, . . . , m = d2 − 1 can be modeled as a series y = H0 [x] + H1 [x] + H2 [x] + · · · + Hn [x] + · · · (1) of discrete Volterra functionals H0 [x] = h0 = const. and Hn [x] = m i1 =1 ··· m in =1 (n) hi1 ...in xi1 . . . xin . (2) Here, we have stacked the grayvalues of the neighbourhood into the vector x = (x1 , . . . , xm ) ∈ Rm . The discrete nth-order Volterra functional is, accordingly, a linear combination of all ordered nth-order monomials of the elements of x with mn coefficients (n) hi1 ...in . Volterra functionals provide a controlled way of introducing multiplicative interactions of image points since a functional of order n contains all products of the input of order n. In terms of higher-order statistics, this means that we can control the order of the statistics used since an nth-order Volterra series leads to dependencies between maximally n + 1 pixels. Unfortunately, Volterra functionals are not orthogonal to each other, i.e., depending on the input distribution, a functional of order n generally leads to additional lower-order interactions. As a result, the output of the functional will contain components that are proportional to that of some lower-order monomials. For instance, the output of a second-order Volterra functional for Gaussian input generally has a mean different from zero [5]. If one wants to estimate the zeroeth-order component of an image (i.e., the constant component created without pixel interactions) the constant component created by the second-order interactions needs to be subtracted. For general Volterra series, this correction can be achieved by decomposing it into a new series y = G0 [x] + G1 [x] + · · · + Gn [x] + · · · of functionals Gn [x] that are uncorrelated, i.e., orthogonal with respect to the input. The resulting Wiener functionals1 Gn [x] are linear combinations of Volterra functionals up to order n. They are computed from the original Volterra series by a procedure akin to Gram-Schmidt orthogonalization [5]. It can be shown that any Wiener expansion of finite degree minimizes the mean squared error between the true system output and its Volterra series model [5]. The orthogonality condition ensures that a Wiener functional of order n captures only the component of the image created by the multiplicative interaction of n pixels. In contrast to general Volterra functionals, a Wiener functional is orthogonal to all monomials of lower order [5]. So far, we have not gained anything compared to classical estimation of higher-order moments or spectra: an nth-order Volterra functional contains the same number of terms as 1 Strictly speaking, the term Wiener functional is reserved for orthogonal Volterra functionals with respect to Gaussian input. Here, the term will be used for orthogonalized Volterra functionals with arbitrary input distributions. the corresponding n + 1-order spectrum, and a Wiener functional of the same order has an even higher number of coefficients as it consists also of lower-order Volterra functionals. In the next section, we will introduce an implicit representation of the Wiener series using polynomial kernels which allows for an efficient computation of the Wiener functionals. 2 Estimating Wiener series by regression in RKHS Volterra series as linear functionals in RKHS. The nth-order Volterra functional is a weighted sum of all nth-order monomials of the input vector x. We can interpret the evaluation of this functional for a given input x as a map φn defined for n = 0, 1, 2, . . . as φ0 (x) = 1 and φn (x) = (xn , xn−1 x2 , . . . , x1 xn−1 , xn , . . . , xn ) 1 2 m 1 2 (3) n such that φn maps the input x ∈ Rm into a vector φn (x) ∈ Fn = Rm containing all mn ordered monomials of degree n. Using φn , we can write the nth-order Volterra functional in Eq. (2) as a scalar product in Fn , Hn [x] = ηn φn (x), (n) (4) (n) (n) with the coefficients stacked into the vector ηn = (h1,1,..1 , h1,2,..1 , h1,3,..1 , . . . ) ∈ Fn . The same idea can be applied to the entire pth-order Volterra series. By stacking the maps φn into a single map φ(p) (x) = (φ0 (x), φ1 (x), . . . , φp (x)) , one obtains a mapping from p+1 2 p Rm into F(p) = R × Rm × Rm × . . . Rm = RM with dimensionality M = 1−m . The 1−m entire pth-order Volterra series can be written as a scalar product in F(p) p n=0 Hn [x] = (η (p) ) φ(p) (x) (5) with η (p) ∈ F(p) . Below, we will show how we can express η (p) as an expansion in terms of the training points. This will dramatically reduce the number of parameters we have to estimate. This procedure works because the space Fn of nth-order monomials has a very special property: it has the structure of a reproducing kernel Hilbert space (RKHS). As a consequence, the dot product in Fn can be computed by evaluating a positive definite kernel function kn (x1 , x2 ). For monomials, one can easily show that (e.g., [6]) φn (x1 ) φn (x2 ) = (x1 x2 )n =: kn (x1 , x2 ). (6) Since F(p) is generated as a direct sum of the single spaces Fn , the associated scalar product is simply the sum of the scalar products in the Fn : φ(p) (x1 ) φ(p) (x2 ) = p n=0 (x1 x2 )n = k (p) (x1 , x2 ). (7) Thus, we have shown that the discretized Volterra series can be expressed as a linear functional in a RKHS2 . Linear regression in RKHS. For our prediction problem (1), the RKHS property of the Volterra series leads to an efficient solution which is in part due to the so called representer theorem (e.g., [6]). It states the following: suppose we are given N observations 2 A similar approach has been taken by [1] using the inhomogeneous polynomial kernel = (1 + x1 x2 )p . This kernel implies a map φinh into the same space of monomials, but it weights the degrees of the monomials differently as can be seen by applying the binomial theorem. (p) kinh (x1 , x2 ) (x1 , y1 ), . . . , (xN , yN ) of the function (1) and an arbitrary cost function c, Ω is a nondecreasing function on R>0 and . F is the norm of the RKHS associated with the kernel k. If we minimize an objective function c((x1 , y1 , f (x1 )), . . . , (xN , yN , f (xN ))) + Ω( f F ), (8) 3 over all functions in the RKHS, then an optimal solution can be expressed as N f (x) = j=1 aj k(x, xj ), aj ∈ R. (9) In other words, although we optimized over the entire RKHS including functions which are defined for arbitrary input points, it turns out that we can always express the solution in terms of the observations xj only. Hence the optimization problem over the extremely large number of coefficients η (p) in Eq. (5) is transformed into one over N variables aj . Let us consider the special case where the cost function is the mean squared error, N 1 c((x1 , y1 , f (x1 )), . . . , (xN , yN , f (xN ))) = N j=1 (f (xj ) − yj )2 , and the regularizer Ω is zero4 . The solution for a = (a1 , . . . , aN ) is readily computed by setting the derivative of (8) with respect to the vector a equal to zero; it takes the form a = K −1 y with the Gram matrix defined as Kij = k(xi , xj ), hence5 y = f (x) = a z(x) = y K −1 z(x), (10) N where z(x) = (k(x, x1 ), k(x, x2 ), . . . k(x, xN )) ∈ R . Implicit Wiener series estimation. As we stated above, the pth-degree Wiener expansion is the pth-order Volterra series that minimizes the squared error. This can be put into the regression framework: since any finite Volterra series can be represented as a linear functional in the corresponding RKHS, we can find the pth-order Volterra series that minimizes the squared error by linear regression. This, by definition, must be the pth-degree Wiener series since no other Volterra series has this property6 . From Eqn. (10), we obtain the following expressions for the implicit Wiener series p p 1 −1 G0 [x] = y 1, Hn [x] = y Kp z(p) (x) (11) Gn [x] = n=0 n=0 N (p) where the Gram matrix Kp and the coefficient vector z (x) are computed using the kernel from Eq. (7) and 1 = (1, 1, . . . ) ∈ RN . Note that the Wiener series is represented only implicitly since we are using the RKHS representation as a sum of scalar products with the training points. Thus, we can avoid the “curse of dimensionality”, i.e., there is no need to compute the possibly large number of coefficients explicitly. The explicit Volterra and Wiener expansions can be recovered from Eq. (11) by collecting all terms containing monomials of the desired order and summing them up. The individual nth-order Volterra functionals in a Wiener series of degree p > 0 are given implicitly by −1 Hn [x] = y Kp zn (x) n n (12) n with zn (x) = ((x1 x) , (x2 x) , . . . , (xN x) ) . For p = 0 the only term is the constant zero-order Volterra functional H0 [x] = G0 [x]. The coefficient vector ηn = (n) (n) (n) (h1,1,...1 , h1,2,...1 , h1,3,...1 , . . . ) of the explicit Volterra functional is obtained as −1 η n = Φ n Kp y 3 (13) for conditions on uniqueness of the solution, see [6] Note that this is different from the regularized approach used by [1]. If Ω is not zero, the resulting Volterra series are different from the Wiener series since they are not orthogonal with respect to the input. 5 If K is not invertible, K −1 denotes the pseudo-inverse of K. 6 assuming symmetrized Volterra kernels which can be obtained from any Volterra expanson. 4 using the design matrix Φn = (φn (x1 ) , φn (x1 ) , . . . , φn (x1 ) ) . The individual Wiener functionals can only be recovered by applying the regression procedure twice. If we are interested in the nth-degree Wiener functional, we have to compute the solution for the kernels k (n) (x1 , x2 ) and k (n−1) (x1 , x2 ). The Wiener functional for n > 0 is then obtained from the difference of the two results as Gn [x] = n i=0 Gi [x] − n−1 i=0 Gi [x] = y −1 −1 Kn z(n) (x) − Kn−1 z(n−1) (x) . (14) The corresponding ith-order Volterra functionals of the nth-degree Wiener functional are computed analogously to Eqns. (12) and (13) [3]. Orthogonality. The resulting Wiener functionals must fulfill the orthogonality condition which in its strictest form states that a pth-degree Wiener functional must be orthogonal to all monomials in the input of lower order. Formally, we will prove the following Theorem 1 The functionals obtained from Eq. (14) fulfill the orthogonality condition E [m(x)Gp [x]] = 0 (15) where E denotes the expectation over the input distribution and m(x) an arbitrary ithorder monomial with i < p. We will show that this a consequence of the least squares fit of any linear expansion in a set M of basis functions of the form y = j=1 γj ϕj (x). In the case of the Wiener and Volterra expansions, the basis functions ϕj (x) are monomials of the components of x. M We denote the error of the expansion as e(x) = y − j=1 γj ϕj (xi ). The minimum of the expected quadratic loss L with respect to the expansion coefficient γk is given by ∂ ∂L = E e(x) ∂γk ∂γk 2 = −2E [ϕk (x)e(x)] = 0. (16) This means that, for an expansion in a set of basis functions minimizing the squared error, the error is orthogonal to all basis functions used in the expansion. Now let us assume we know the Wiener series expansion (which minimizes the mean squared error) of a system up to degree p − 1. The approximation error is given by the ∞ sum of the higher-order Wiener functionals e(x) = n=p Gn [x], so Gp [x] is part of the error. As a consequence of the linearity of the expectation, Eq. (16) implies ∞ n=p ∞ E [ϕk (x)Gn [x]] = 0 and n=p+1 E [ϕk (x)Gn [x]] = 0 (17) for any φk of order less than p. The difference of both equations yields E [ϕk (x)Gp [x]] = 0, so that Gp [x] must be orthogonal to any of the lower order basis functions, namely to all monomials with order smaller than p. 3 Experiments Toy examples. In our first experiment, we check whether our intuitions about higher-order statistics described in the introduction are captured by the proposed method. In particular, we expect that arbitrarily oriented lines can only be predicted using third-order statistics. As a consequence, we should need at least a second-order Wiener functional to predict lines correctly. Our first test image (size 80 × 110, upper row in Fig. 1) contains only lines of varying orientations. Choosing a 5 × 5 neighbourhood, we predicted the central pixel using (11). original image 0th-order component/ reconstruction 1st-order reconstruction 1st-order component 2nd-order reconstruction 2nd-order component 3rd-order reconstruction mse = 583.7 mse = 0.006 mse = 0 mse = 1317 mse = 37.4 mse = 0.001 mse = 1845 mse = 334.9 3rd-order component mse = 19.0 Figure 1: Higher-order components of toy images. The image components of different orders are created by the corresponding Wiener functionals. They are added up to obtain the different orders of reconstruction. Note that the constant 0-order component and reconstruction are identical. The reconstruction error (mse) is given as the mean squared error between the true grey values of the image and the reconstruction. Although the linear first-order model seems to reconstruct the lines, this is actually not true since the linear model just smoothes over the image (note its large reconstruction error). A correct prediction is only obtained by adding a second-order component to the model. The third-order component is only significant at crossings, corners and line endings. Models of orders 0 . . . 3 were learned from the image by extracting the maximal training set of 76 × 106 patches of size 5 × 57 . The corresponding image components of order 0 to 3 were computed according to (14). Note the different components generated by the Wiener functionals can also be negative. In Fig. 1, they are scaled to the gray values [0..255]. The behaviour of the models conforms to our intuition: the linear model cannot capture the line structure of the image thus leading to a large reconstruction error which drops to nearly zero when a second-order model is used. The additional small correction achieved by the third-order model is mainly due to discretization effects. Similar to lines, we expect that we need at least a third-order model to predict crossings or corners correctly. This is confirmed by the second and third test image shown in the corresponding row in Fig. 1. Note that the third-order component is only significant at crossings, corners and line endings. The fourth- and fifth-order terms (not shown) have only negligible contributions. The fact that the reconstruction error does not drop to zero for the third image is caused by the line endings which cannot be predicted to a higher accuracy than one pixel. Application to natural images. Are there further predictable structures in natural images that are not due to lines, crossings or corners? This can be investigated by applying our method to a set of natural images (an example of size 80 × 110 is depicted in Fig. 2). Our 7 In contrast to the usual setting in machine learning, training and test set are identical in our case since we are not interested in generalization to other images, but in analyzing the higher-order components of the image at hand. original image 0th-order component/ reconstruction 1st-order reconstruction mse = 1070 1st-order component 2nd-order reconstruction mse = 957.4 2nd-order component 3rd-order reconstruction mse = 414.6 3rd-order component 4th-order reconstruction mse = 98.5 4th-order component 5th-order reconstruction mse = 18.5 5th-order component 6th-order reconstruction mse = 4.98 6th-order component 7th-order reconstruction mse = 1.32 7th-order component 8th-order reconstruction mse = 0.41 8th-order component Figure 2: Higher-order components and reconstructions of a photograph. Interactions up to the fifth order play an important role. Note that significant components become sparser with increasing model order. results on a set of 10 natural images of size 50 × 70 show an an approximately exponential decay of the reconstruction error when more and more higher-order terms are added to the reconstruction (Fig. 3). Interestingly, terms up to order 5 still play a significant role, although the image regions with a significant component become sparser with increasing model order (see Fig. 2). Note that the nonlinear terms reduce the reconstruction error to almost 0. This suggests a high degree of higher-order redundancy in natural images that cannot be exploited by the usual linear prediction models. 4 Conclusion The implicit estimation of Wiener functionals via polynomial kernels opens up new possibilities for the estimation of higher-order image statistics. Compared to the classical methods such as higher-order spectra, moments or cumulants, our approach avoids the combinatorial explosion caused by the exponential increase of the number of terms to be estimated and interpreted. When put into a predictive framework, multiplicative pixel interactions of different orders are easily visualized and conform to the intuitive notions about image structures such as edges, lines, crossings or corners. There is no one-to-one mapping between the classical higher-order statistics and multiplicative pixel interactions. Any nonlinear Wiener functional, for instance, creates infinitely many correlations or cumulants of higher order, and often also of lower order. On the other 700 Figure 3: Mean square reconstruction error of 600 models of different order for a set of 10 natural images. mse 500 400 300 200 100 0 0 1 2 3 4 5 6 7 model order hand, a Wiener functional of order n produces only harmonic phase interactions up to order n + 1, but sometimes also of lower orders. Thus, when one analyzes a classical statistic of a given order, one often cannot determine by which order of pixel interaction it was created. In contrast, our method is able to isolate image components that are created by a single order of interaction. Although of preliminary nature, our results on natural images suggest an important role of statistics up to the fifth order. Most of the currently used low-level feature detectors such as edge or corner detectors maximally use third-order interactions. The investigation of fourth- or higher-order features is a field that might lead to new insights into the nature and role of higher-order image structures. As often observed in the literature (e.g. [2][7]), our results seem to confirm that a large proportion of the redundancy in natural images is contained in the higher-order pixel interactions. Before any further conclusions can be drawn, however, our study needs to be extended in several directions: 1. A representative image database has to be analyzed. The images must be carefully calibrated since nonlinear statistics can be highly calibrationsensitive. In addition, the contribution of image noise has to be investigated. 2. Currently, only images up to 9000 pixels can be analyzed due to the matrix inversion required by Eq. 11. To accomodate for larger images, our method has to be reformulated in an iterative algorithm. 3. So far, we only considered 5 × 5-patches. To systematically investigate patch size effects, the analysis has to be conducted in a multi-scale framework. References [1] T. J. Dodd and R. F. Harrison. A new solution to Volterra series estimation. In CD-Rom Proc. 2002 IFAC World Congress, 2002. [2] D. J. Field. What is the goal of sensory coding? Neural Computation, 6:559 – 601, 1994. [3] M. O. Franz and B. Sch¨lkopf. Implicit Wiener series. Technical Report 114, Max-Plancko Institut f¨r biologische Kybernetik, T¨bingen, June 2003. u u [4] C. L. Nikias and A. P. Petropulu. Higher-order spectra analysis. Prentice Hall, Englewood Cliffs, NJ, 1993. [5] M. Schetzen. The Volterra and Wiener theories of nonlinear systems. Krieger, Malabar, 1989. [6] B. Sch¨lkopf and A. J. Smola. Learning with kernels. MIT Press, Cambridge, MA, 2002. o [7] O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neurosc., 4(8):819 – 825, 2001. [8] M. G. A. Thomson. Higher-order structure in natural scenes. J. Opt.Soc. Am. A, 16(7):1549 – 1553, 1999. [9] M. G. A. Thomson. Beats, kurtosis and visual coding. Network: Compt. Neural Syst., 12:271 – 287, 2001.

3 0.66117221 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1

4 0.65397346 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons

Author: Rajesh P. Rao

Abstract: There is growing evidence from psychophysical and neurophysiological studies that the brain utilizes Bayesian principles for inference and decision making. An important open question is how Bayesian inference for arbitrary graphical models can be implemented in networks of spiking neurons. In this paper, we show that recurrent networks of noisy integrate-and-fire neurons can perform approximate Bayesian inference for dynamic and hierarchical graphical models. The membrane potential dynamics of neurons is used to implement belief propagation in the log domain. The spiking probability of a neuron is shown to approximate the posterior probability of the preferred state encoded by the neuron, given past inputs. We illustrate the model using two examples: (1) a motion detection network in which the spiking probability of a direction-selective neuron becomes proportional to the posterior probability of motion in a preferred direction, and (2) a two-level hierarchical network that produces attentional effects similar to those observed in visual cortical areas V2 and V4. The hierarchical model offers a new Bayesian interpretation of attentional modulation in V2 and V4. 1

5 0.65048736 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Author: Robert Jenssen, Deniz Erdogmus, Jose Principe, Torbjørn Eltoft

Abstract: A new distance measure between probability density functions (pdfs) is introduced, which we refer to as the Laplacian pdf distance. The Laplacian pdf distance exhibits a remarkable connection to Mercer kernel based learning theory via the Parzen window technique for density estimation. In a kernel feature space defined by the eigenspectrum of the Laplacian data matrix, this pdf distance is shown to measure the cosine of the angle between cluster mean vectors. The Laplacian data matrix, and hence its eigenspectrum, can be obtained automatically based on the data at hand, by optimal Parzen window selection. We show that the Laplacian pdf distance has an interesting interpretation as a risk function connected to the probability of error. 1

6 0.64981896 131 nips-2004-Non-Local Manifold Tangent Learning

7 0.64819074 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

8 0.64801276 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

9 0.64747244 69 nips-2004-Fast Rates to Bayes for Kernel Machines

10 0.64686656 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

11 0.64665484 168 nips-2004-Semigroup Kernels on Finite Sets

12 0.64545691 127 nips-2004-Neighbourhood Components Analysis

13 0.64535087 23 nips-2004-Analysis of a greedy active learning strategy

14 0.64500171 178 nips-2004-Support Vector Classification with Input Data Uncertainty

15 0.64454311 102 nips-2004-Learning first-order Markov models for control

16 0.64421755 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

17 0.6434406 28 nips-2004-Bayesian inference in spiking neurons

18 0.64340162 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

19 0.64279103 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

20 0.64249188 151 nips-2004-Rate- and Phase-coded Autoassociative Memory