nips nips2012 nips2012-183 knowledge-graph by maker-knowledge-mining

183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees


Source: pdf

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper introduces timeline trees, which are partial models of partially observable environments. [sent-3, score-0.843]

2 Timeline trees are given some specific predictions to make and learn a decision tree over history. [sent-4, score-0.482]

3 The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). [sent-5, score-1.005]

4 Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e. [sent-6, score-1.112]

5 This paper introduces timeline trees which are partial models for partially observable environments. [sent-18, score-1.037]

6 Timeline trees are focused on capturing a particular kind of partial observability; they assume that their predictions can be made by recalling a (finite) sequence of events in the past that may have occurred far apart from each other in time. [sent-19, score-0.437]

7 The main idea of timeline trees is to build a decision tree over history. [sent-23, score-0.996]

8 As with similar approaches, the decision tree can split on features of observations in recent history. [sent-24, score-0.305]

9 However, a timeline tree may also establish new timestamps in the past and is able split on features of observations surrounding those events as well. [sent-25, score-1.198]

10 For instance, there could be a timestamp representing the last time the agent saw its keys, and then features of the neighboring observations could identify the keys’ location. [sent-26, score-0.646]

11 In this way, timeline trees can make use of information arbitrarily spread out in history. [sent-27, score-0.825]

12 At every step t, the agent selects an action at from a finite set A and the environment (stochastically) emits an observation ot , taken from a finite set O. [sent-33, score-0.36]

13 If one is able to predict the next observation at any history and for any action (that is, if one has access to this conditional distribution), one can compute the probability of any future sequence of observations given any future sequence of actions and the history [5]. [sent-40, score-0.279]

14 This paper will focus on partial models that make conditional predictions about abstract features of the next observation, though many of the ideas can be straightforwardly adapted to work with predictions of other forms. [sent-44, score-0.331]

15 Thus the decision tree learns an abstraction both over observations (which could be high-dimensional in their own right) and over the sequence of observations (by using features from multiple time-steps). [sent-53, score-0.418]

16 Timeline trees are an extension of UTree that allow it to use features of observations arbitrarily far back in the past, though they are not the first attempt to address this issue. [sent-55, score-0.328]

17 Looping predictive suffix trees (LPSTs) [10] are prediction suffix trees [11] that allow nodes to loop back to their ancestors. [sent-56, score-0.475]

18 Local agent state representations (LASR) [12] map histories to a real number, and then learn a direct mapping from that number to the target predictions. [sent-57, score-0.327]

19 2 It should be noted that prediction profile models have been combined with an additional preprocessing step that learns an abstraction before the prediction profile model is learned [15]. [sent-69, score-0.293]

20 3 Timeline Trees The goal of timeline trees is to combine the strengths of U-Tree with the ability to attend to events arbitrarily far apart in history (rather than limited to a finite suffix). [sent-71, score-0.999]

21 Unlike several of the above approaches, timeline trees are not arbitrarily recurrent (they do not contain loops except in a limited, implicit sense), which does restrict their representational capacity. [sent-72, score-0.851]

22 1 Timestamps The decision tree built by U-Tree splits on features of observations at some temporal offset from the current timestep. [sent-75, score-0.392]

23 The key idea of timeline trees is to allow multiple timestamps in history and to allow splits on features of observations at temporal offsets relative to any of these timestamps. [sent-76, score-1.328]

24 Timeline trees take a set F of binary features, where each feature f (ht , k) takes the history at time t, ht , and a timestep 0 < k ≤ t + 11 and returns 1 or 0. [sent-77, score-0.462]

25 For a fixed vector τ of timestamps, the model is a standard decision tree, and only a small extension of U-Tree (which fixed the number of timestamps to 1: the current timestep). [sent-80, score-0.298]

26 Each internal node in the tree is associated with a feature f , a timestamp index i, and a temporal offset δ (which may be negative) and has two children representing histories where the value of f (ht , τ [i] + δ) is 0 or 1, respectively. [sent-81, score-0.615]

27 To use the timeline tree to make a prediction, one simply follows a path from the root to a leaf in the tree, choosing the appropriate child at each node according to the feature value f (ht , τ [i] + δ). [sent-83, score-0.864]

28 For every feature f ∈ F, there is an additional timestamp feature ξ f . [sent-86, score-0.377]

29 More importantly, the greatest such m (that is, the time of the most recent occurence of f ), call it mf , is added as a timestamp to all nodes in the subtree where ξ f = 1. [sent-88, score-0.309]

30 When making a prediction for ht , one maintains a growing vector τ of timestamps (in order of least to most recent). [sent-89, score-0.435]

31 As one travels from the root to a leaf, one may encounter a node associated with timestamp feature ξ f . [sent-91, score-0.374]

32 Such a node is also associated with an index i into the current timestamp vector. [sent-92, score-0.309]

33 As such, the tree is able to establish timestamps based on the occurence of key events (the presence of some feature). [sent-95, score-0.493]

34 There are systems that would require an infinite timeline tree that approaches in Section 2. [sent-98, score-0.753]

35 The agent has three actions: move 1 For simplicity’s sake, the notation f (ht , k) hides the fact that the features may also depend on at+1 and κ(ot+1 ). [sent-105, score-0.286]

36 The agent can observe its location and whether the key is in the room. [sent-111, score-0.294]

37 On the right of Figure 1 an example timeline tree is shown that can predict whether the agent will see the key in the next timestep. [sent-113, score-1.045]

38 The root checks if the agent can currently see the key. [sent-115, score-0.289]

39 Otherwise, the agent must remember where the key was last seen. [sent-117, score-0.313]

40 The square-shaped node is meant to indicate a timestamp feature, which checks if the agent has ever seen the key before the only timestamp. [sent-118, score-0.644]

41 If so, a new timestamp m is added that marks the last time the key was seen. [sent-120, score-0.346]

42 If the agent put the key in its pocket after m, it must take the key out to see it. [sent-121, score-0.357]

43 4 Learning Timeline Trees Timeline trees can be learned using standard decision tree induction algorithms (e. [sent-123, score-0.42]

44 The leaves of the tree contain the estimated predictions (counts of the occurrences of the various values of ω(o) associated with histories mapping to that leaf). [sent-127, score-0.305]

45 At each phase every candidate expansion (every leaf and every feature) is tried and the one that results in the highest information gain between the predictions of the original tree and the expanded tree is greedily selected. [sent-130, score-0.433]

46 The main difference in timeline trees is that different features may be available in different leaf nodes (because different timestamps will be available). [sent-131, score-1.151]

47 Specifically for each leaf n, all features of the form f (·, τn [i] + k) are considered for all timestamp indices i ∈ {1, . [sent-132, score-0.412]

48 Similarly, all timestamp features of the form ξ f (·, τn [i − 1], τn [i]) are considered for all timestamp indices i. [sent-136, score-0.675]

49 In the experiments below, candidate expansions also include all combinations of timestamp features and regular features (essentially two expansions at once). [sent-137, score-0.495]

50 These compound features take the form of first splitting on a timestamp feature, and then splitting the resulting “1 child” with a regular feature. [sent-138, score-0.397]

51 This allows the tree to notice that a timestamp is useful for the subsequent splits it allows, even if it is not inherently informative itself. [sent-139, score-0.486]

52 For instance, in the Key World, knowing whether the agent has ever seen the key may not be very informative, but knowing that the pocket action was taken immediately after seeing the key is very informative. [sent-140, score-0.441]

53 4 (a) Shooting Gallery (b) Three Card Monte (c) Snake Figure 2: Experiment Domains 5 Experiments In this section, timeline trees will be evaluated in three problems to which prediction profile models have been previously applied. [sent-152, score-0.914]

54 For various amounts of training trajectories, timeline trees are learned and their prediction accuracy is evaluated (as well as their usefulness for control). [sent-154, score-0.948]

55 For timeline trees, the new data is simply added to the existing tree and new splits are made until the algorithm stops. [sent-157, score-0.782]

56 This strategy is effective for timeline trees since the initial splits can often be made with relatively little data (this not possible for prediction profile models). [sent-158, score-0.915]

57 In addition to evaluating timeline trees, two variants will also be evaluated. [sent-159, score-0.605]

58 One (labeled “Finite Suffix”) does not use any timestamp features at all. [sent-160, score-0.366]

59 The other (labeled “No Timestamps”) includes timestamp features, but does not use them to create new timestamps. [sent-162, score-0.309]

60 1 Shooting Gallery In this example, from Talvitie and Singh [4], the agent is in a shooting gallery (see Figure 2(a)). [sent-165, score-0.382]

61 If the target is in the crosshairs in the step after the agent shoots, the agent gets a reward of 10. [sent-167, score-0.63]

62 Whenever the agent hits the target, the gallery resets (obstacles are placed randomly) and an special observation is emitted. [sent-169, score-0.323]

63 Clearly the agent must predict whether the target will be in the crosshairs in the next timestep, but the target’s movement is stochastic and partially observable. [sent-172, score-0.394]

64 By constrast, timeline trees learn an abstraction over both observations and the history sequence. [sent-181, score-0.991]

65 Though short trajectories are necessary for training prediction profile models, the timeline trees tended to overfit to the short trajectories. [sent-183, score-0.982]

66 To train the tree models, a binary feature was provided for each color (target, obstacle, background, or reset) for each pixel in the image. [sent-188, score-0.286]

67 The maximum temporal offset from a timestamp was set to 2. [sent-190, score-0.367]

68 Both the timeline trees and the prediction profile models are able to learn to make good predictions, but timeline trees do so with less data. [sent-196, score-1.713]

69 Remember that timeline trees are learning from raw images whereas the prediction profile models have been provided a hand-crafted abstraction. [sent-197, score-0.914]

70 The tree models without timestamps are only able to make good predictions in histories where the target has recently moved, which limits their performance. [sent-198, score-0.614]

71 The line marked “Prediction Profile” shows the best results reported by Talvitie and Singh [4]; the other curves show the performance of timeline trees and the comparison variants. [sent-210, score-0.828]

72 While illustrating the limitations of timeline trees in comparison to more expressive methods, it also demonstrates that they can represent useful knowledge that the simpler tree-based methods cannot. [sent-224, score-0.799]

73 Eventually the dealer asks the agent to flip over the ace. [sent-228, score-0.288]

74 If the agent succeeds, it gets a reward of 1; if it fails it gets a reward of -1. [sent-229, score-0.401]

75 Note that to do well in this game, the agent only needs to make the prediction, “If I flip card 1, will it be the ace? [sent-231, score-0.293]

76 Since timeline trees’ primary strength is ignoring sections of history to focus on a few key events, they would not be expected to model this problem well. [sent-243, score-0.721]

77 The maximum time offset from a timestamp was 10 steps (both positive and negative). [sent-248, score-0.34]

78 For fairness, the tree models were also seeded with these features (they were split on the agent’s action before training). [sent-252, score-0.277]

79 As expected, the tree methods perform poorly (negative average reward indicates more wrong guesses than right), though timeline trees have marginally better control performance. [sent-281, score-1.033]

80 Part of the difficulty is that randomly generated training data is quite different than what the agent will encounter during testing (the random agent flips cards over frequently, while the learning agent eventually flips a card only when prompted to). [sent-282, score-0.85]

81 Expert-trained timeline trees are eventually good enough to allow the agent to achieve positive average reward, though they do require a great deal of data to do so (note the changes to the axes). [sent-285, score-1.028]

82 So, though their representational limitations do prevent all three tree-based methods from performing well in this problem, timeline trees’ ability to create new timestamps seems to allow them to make some meaningful (and useful) predictions that the others cannot. [sent-287, score-0.971]

83 If the snake ever runs into the wrong pellet, its own body, or the edge of the screen, the game is over and the agent receives -0. [sent-294, score-0.47]

84 Whenever the snake eats a pellet, the agent gets 1 reward and the tail stays still for 5 timesteps, making the snake’s body longer. [sent-296, score-0.525]

85 The tail shadows the head and, in addition, the pellet the snake must eat next is invisible. [sent-300, score-0.38]

86 To do well, the agent must remember the location of the next pellet. [sent-303, score-0.304]

87 Experimental Setup: For every location (x, y) and every color c, a timeline tree model was used to predict whether the pixel at (x, y) would next be color c. [sent-306, score-0.971]

88 In terms of control performance, the “No Timestamps” variant performs nearly identically to the full timeline tree. [sent-331, score-0.63]

89 The model checks if each pixel has ever contained a food pellet and if it has ever contained the head. [sent-333, score-0.296]

90 The timeline tree incurs substantially less prediction error, indicating that it is able to model the tail more accurately. [sent-337, score-0.889]

91 Despite the hand-crafted structure provided to the prediction profile models, timeline trees learn higher quality models with less data. [sent-342, score-0.914]

92 In fact, UCT appears to perform better using the timeline tree model than the true model (marked “True”). [sent-343, score-0.753]

93 Conclusions In these experiments, timeline trees learned to capture long-range dependencies in complex, partially observable systems with high-dimensional observations. [sent-352, score-1.001]

94 The assumption that the predictions of interest depend on only a few key events in the past is limiting in the sense that there are simple partial models that timeline trees cannot easily capture (e. [sent-353, score-1.107]

95 Three Card Monte), but it does reflect a broad, natural class of partially observable phenomena (the examples here, for instance, were not designed with timeline trees in mind). [sent-355, score-0.945]

96 In problems that do match timeline trees’ inductive biases, they have been shown to outperform the more expressive prediction profile models. [sent-356, score-0.692]

97 There are many possible directions in which to consider extending timeline trees. [sent-357, score-0.605]

98 Regression tree methods could extend timeline trees into environments with continuous dimensions. [sent-359, score-0.972]

99 The timestamp features used here are only one of many possible types of temporally abstract features that could be devised. [sent-360, score-0.423]

100 1 in order to increase expressive power, while retaining the benefits of timeline trees. [sent-362, score-0.605]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('timeline', 0.605), ('timestamp', 0.309), ('timestamps', 0.249), ('agent', 0.229), ('trees', 0.194), ('snake', 0.161), ('tree', 0.148), ('pro', 0.139), ('talvitie', 0.135), ('pellet', 0.108), ('suffix', 0.108), ('timelinetree', 0.108), ('le', 0.106), ('ht', 0.099), ('gallery', 0.094), ('observable', 0.093), ('predictions', 0.091), ('ot', 0.087), ('prediction', 0.087), ('profile', 0.081), ('history', 0.079), ('finite', 0.073), ('cards', 0.066), ('histories', 0.066), ('partial', 0.064), ('card', 0.064), ('trajectories', 0.063), ('head', 0.062), ('abstraction', 0.062), ('reward', 0.061), ('color', 0.06), ('events', 0.059), ('ace', 0.059), ('dealer', 0.059), ('shooting', 0.059), ('features', 0.057), ('timestep', 0.056), ('crosshairs', 0.054), ('olgarb', 0.054), ('pocket', 0.054), ('partially', 0.053), ('uct', 0.051), ('observations', 0.051), ('tail', 0.049), ('decision', 0.049), ('remember', 0.047), ('leaf', 0.046), ('action', 0.044), ('pixel', 0.044), ('satinder', 0.041), ('game', 0.04), ('ever', 0.04), ('lpsts', 0.04), ('key', 0.037), ('rmse', 0.036), ('expansions', 0.036), ('singh', 0.036), ('attend', 0.036), ('keys', 0.036), ('food', 0.035), ('erik', 0.035), ('feature', 0.034), ('trials', 0.033), ('training', 0.033), ('target', 0.032), ('compound', 0.031), ('offset', 0.031), ('root', 0.031), ('obstacles', 0.031), ('swap', 0.031), ('expert', 0.031), ('observability', 0.029), ('ips', 0.029), ('checks', 0.029), ('splits', 0.029), ('marked', 0.029), ('learned', 0.029), ('past', 0.029), ('mccallum', 0.029), ('models', 0.028), ('location', 0.028), ('dependencies', 0.027), ('temporal', 0.027), ('arcade', 0.027), ('choses', 0.027), ('eaten', 0.027), ('fsms', 0.027), ('lasr', 0.027), ('mahmud', 0.027), ('pellets', 0.027), ('arbitrarily', 0.026), ('representational', 0.026), ('predict', 0.026), ('environments', 0.025), ('gets', 0.025), ('screen', 0.025), ('control', 0.025), ('colors', 0.024), ('complete', 0.024), ('disappears', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

2 0.10551166 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Author: Arthur Guez, David Silver, Peter Dayan

Abstract: Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayesoptimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems – because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration. 1

3 0.10400546 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

Author: Jaedeug Choi, Kee-eung Kim

Abstract: We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We provide an efficient Metropolis-Hastings sampling algorithm utilizing the gradient of the posterior to estimate the underlying reward functions, and demonstrate that our approach outperforms previous ones via experiments on a number of problem domains. 1

4 0.10364041 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

Author: Trung Nguyen, Tomi Silander, Tze Y. Leong

Abstract: We study how to automatically select and adapt multiple abstractions or representations of the world to support model-based reinforcement learning. We address the challenges of transfer learning in heterogeneous environments with varying tasks. We present an efficient, online framework that, through a sequence of tasks, learns a set of relevant representations to be used in future tasks. Without predefined mapping strategies, we introduce a general approach to support transfer learning across different state spaces. We demonstrate the potential impact of our system through improved jumpstart and faster convergence to near optimum policy in two benchmark domains. 1

5 0.096115313 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

6 0.091088317 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

7 0.088962846 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

8 0.084082067 194 nips-2012-Learning to Discover Social Circles in Ego Networks

9 0.082640551 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

10 0.081940793 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability

11 0.079076782 260 nips-2012-Online Sum-Product Computation Over Trees

12 0.074284121 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

13 0.070061065 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.067597963 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

15 0.067595661 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

16 0.067079328 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

17 0.066690467 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

18 0.066377252 195 nips-2012-Learning visual motion in recurrent neural networks

19 0.066081345 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

20 0.064621255 288 nips-2012-Rational inference of relative preferences


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, -0.117), (2, -0.072), (3, -0.005), (4, -0.028), (5, -0.05), (6, 0.004), (7, -0.025), (8, -0.114), (9, 0.07), (10, -0.029), (11, 0.024), (12, 0.013), (13, -0.029), (14, -0.087), (15, 0.019), (16, -0.024), (17, 0.061), (18, 0.031), (19, -0.0), (20, -0.024), (21, 0.091), (22, 0.001), (23, 0.005), (24, 0.009), (25, 0.003), (26, -0.034), (27, -0.095), (28, 0.091), (29, 0.048), (30, 0.046), (31, -0.017), (32, -0.036), (33, -0.068), (34, -0.055), (35, 0.081), (36, -0.017), (37, -0.051), (38, -0.037), (39, -0.053), (40, -0.055), (41, -0.067), (42, -0.027), (43, 0.044), (44, -0.067), (45, 0.049), (46, 0.092), (47, -0.115), (48, 0.007), (49, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93102342 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

2 0.60935551 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

3 0.59778726 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

Author: Jeremy Weiss, Sriraam Natarajan, David Page

Abstract: Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

4 0.59315312 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Author: Arthur Guez, David Silver, Peter Dayan

Abstract: Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayesoptimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems – because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration. 1

5 0.58137488 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

6 0.57590789 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.55396295 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

8 0.55147392 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

9 0.54474556 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

10 0.51747364 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

11 0.49685612 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

12 0.49381286 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

13 0.4834446 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

14 0.47273612 267 nips-2012-Perceptron Learning of SAT

15 0.45960242 194 nips-2012-Learning to Discover Social Circles in Ego Networks

16 0.45873472 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

17 0.45123389 288 nips-2012-Rational inference of relative preferences

18 0.44700658 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

19 0.44576114 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

20 0.43107229 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (17, 0.014), (21, 0.039), (38, 0.081), (42, 0.038), (46, 0.27), (54, 0.043), (55, 0.018), (74, 0.103), (76, 0.143), (80, 0.08), (92, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82796961 256 nips-2012-On the connections between saliency and tracking

Author: Vijay Mahadevan, Nuno Vasconcelos

Abstract: A model connecting visual tracking and saliency has recently been proposed. This model is based on the saliency hypothesis for tracking which postulates that tracking is achieved by the top-down tuning, based on target features, of discriminant center-surround saliency mechanisms over time. In this work, we identify three main predictions that must hold if the hypothesis were true: 1) tracking reliability should be larger for salient than for non-salient targets, 2) tracking reliability should have a dependence on the defining variables of saliency, namely feature contrast and distractor heterogeneity, and must replicate the dependence of saliency on these variables, and 3) saliency and tracking can be implemented with common low level neural mechanisms. We confirm that the first two predictions hold by reporting results from a set of human behavior studies on the connection between saliency and tracking. We also show that the third prediction holds by constructing a common neurophysiologically plausible architecture that can computationally solve both saliency and tracking. This architecture is fully compliant with the standard physiological models of V1 and MT, and with what is known about attentional control in area LIP, while explaining the results of the human behavior experiments.

same-paper 2 0.76735413 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

3 0.75054502 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1

4 0.74061078 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

Author: Dongho Kim, Kee-eung Kim, Pascal Poupart

Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1

5 0.62242454 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

6 0.61715317 210 nips-2012-Memorability of Image Regions

7 0.61541736 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

8 0.61531276 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

9 0.61411446 303 nips-2012-Searching for objects driven by context

10 0.61007166 168 nips-2012-Kernel Latent SVM for Visual Recognition

11 0.60944879 201 nips-2012-Localizing 3D cuboids in single-view images

12 0.60844183 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

13 0.60659331 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

14 0.60158485 8 nips-2012-A Generative Model for Parts-based Object Segmentation

15 0.60105354 344 nips-2012-Timely Object Recognition

16 0.60062253 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

17 0.59843141 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

18 0.59842646 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

19 0.59797955 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

20 0.5978449 188 nips-2012-Learning from Distributions via Support Measure Machines