nips nips2012 nips2012-313 knowledge-graph by maker-knowledge-mining

313 nips-2012-Sketch-Based Linear Value Function Approximation

Source: pdf

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially inﬁnite feature vectors to a ﬁxed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and ﬁfty-ﬁve Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. [sent-9, score-0.965]

2 Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. [sent-10, score-0.232]

3 Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. [sent-11, score-0.829]

4 Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. [sent-14, score-0.389]

5 We provide empirical results on two RL benchmark domains and ﬁfty-ﬁve Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. [sent-15, score-0.237]

6 1 Introduction Recent value-based reinforcement learning applications have shown the beneﬁt of exhaustively generating features, both in discrete and continuous state domains. [sent-16, score-0.176]

7 In discrete domains, exhaustive feature generation combines atomic features into logical predicates. [sent-17, score-0.147]

8 Sturtevant and White [21] similarly obtained promising reinforcement learning results using a feature generation method that enumerated all 2, 3 and 4-wise combinations of a set of 60 atomic features. [sent-20, score-0.228]

9 Although such feature vectors are too large to be represented explicitly, in many domains of interest they are also sparse. [sent-23, score-0.115]

10 Given a ﬁxed memory budget, the standard approach is to hash features into a ﬁxed-size table, with collisions implicitly handled by the learning algorithm; all but one of the applications discussed above use some form of hashing. [sent-25, score-0.436]

11 With respect to its typical use for linear value function approximation, hashing lacks theoretical guarantees. [sent-26, score-0.718]

12 In order to improve on the basic hashing idea, we turn to sketches: state-of-the-art methods for approximately storing large vectors [6]. [sent-27, score-0.717]

13 Our goal is to show that one such sketch, the tug-of-war sketch [7], is particularly well-suited for linear value function approximation. [sent-28, score-0.154]

14 Our work is related to recent developments on the use of random projections in reinforcement learning [11] and least-squares regression [16, 10]. [sent-29, score-0.163]

15 Tug-of-war hashing seeks to reconcile the computational efﬁciency that makes hashing a practical method for linear value function approximation on large feature spaces, while preserving the theoretical appeal of random projection methods. [sent-32, score-1.474]

16 A natural concern when using hashing in RL is that hash collisions irremediably degrade learning. [sent-33, score-1.054]

17 In this paper we argue that tug-of-war hashing addresses this concern by providing us with a low-error approximation of large feature vectors at a fraction of the memory cost. [sent-34, score-0.834]

18 ” 2 Background We consider the reinforcement learning framework of Sutton and Barto [23]. [sent-36, score-0.141]

19 At time step t the agent observes state st ∈ S, selects an action at ∈ A and receives a reward rt := R(st , at ). [sent-38, score-0.264]

20 From state st , the agent’s goal is to maximize the expected discounted ∞ i sum of future rewards E i=0 γ R(st+i , at+i ) . [sent-40, score-0.13]

21 A common value function approximation scheme in reinforcement learning is linear approximation. [sent-46, score-0.197]

22 If we let Φ ∈ R|S||A|×n denote the matrix of full feature vectors φ(s, a), and let µ : S × A → [0, 1] denote the steady state distribution over stateaction pairs induced by π and P then, under mild assumptions, we can guarantee the existence and uniqueness of µ. [sent-52, score-0.114]

23 Denote by Φ ∈ R|S||A|×n the matrix of full feature vectors and 2 by µ the stationary distribution on (S, A) induced by π and P . [sent-60, score-0.114]

24 A typical example of this is tile coding on continuous-state domains [22], which generates a number of features exponential in the dimensionality of the state space. [sent-65, score-0.207]

25 In such cases, hashing can effectively be used to approximate Qπ (s, a) using a ﬁxed memory budget. [sent-66, score-0.717]

26 , m}, mapping full feature vector indices into hash table indices, where ˆ m n is the hash table size. [sent-73, score-0.774]

27 We deﬁne standard hashing features as the feature map φ(s, a) whose ith component is deﬁned as: n ˆ φi (s, a) := I[h(j)=i] φj (s, a) , (3) j=1 where φj (s, a) denotes the j th component of φ(s, a) and I[x] denotes the indicator function. [sent-74, score-0.773]

28 We assume that our hash function h is drawn from a universal family: for any i, j ∈ {1, . [sent-75, score-0.345]

29 1 We deﬁne the standard hashing value function Qt (s, a) := θt · φ(s, a), ˆt ∈ Rm is a weight vector, and φ(s, a) is the hashed vector. [sent-79, score-0.794]

30 Because of hashing collisions, ˆ where θ ˆ the standard hashing value function is a biased estimator of Qt (s, a), i. [sent-80, score-1.458]

31 We return to the issue of the bias introduced by standard hashing in Section 4. [sent-84, score-0.756]

32 In the canonical sketch setting, we summarize a ˜ count vector θ ∈ Rn using a sketch vector θ ∈ Rm . [sent-89, score-0.252]

33 t−1 The purpose of the sketch vector is to approximate the count vector θt := i=0 φi . [sent-91, score-0.126]

34 Given two hash ˜t whose ith component is functions, h and ξ : {1, . [sent-92, score-0.324]

35 , n} → {−1, 1}, φt is mapped to a vector φ n ˜ φt,i := I[h(j)=i] φt,j ξ(j) (4) j=1 ˜ ˜ ˜ The tug-of-war sketch vector is then updated as θt+1 ← θt + φt . [sent-95, score-0.126]

36 In addition to h being drawn from a universal family of hash functions, ξ is drawn from a four-wise independent family of hash functions: for all sets of four unique indices {i1 , i2 , i3 , i4 }, Prξ (ξ(i1 ) = k1 , ξ(i2 ) = k2 , ξ(i3 ) = 1 k3 , ξ(i4 ) = k4 ) = 16 with k1 . [sent-96, score-0.69]

37 For an arbitrary φ ∈ Rn and its corresponding ˜ ˜ ˜ tug-of-war vector φ ∈ Rm , Eh,ξ [θt · φ] = θt · φ: the tug-of-war sketch produces unbiased estimates t−1 ˜ ˜ of inner products [7]. [sent-100, score-0.195]

38 1 While it may seem odd to randomly select your hash function, this can equivalently be thought as sampling an indexing assignment for the MDP’s features. [sent-103, score-0.324]

39 While a particular hash function may be well- (or poorly-) suited for a particular MDP, it is hard to imagine how this could be known a priori. [sent-104, score-0.324]

40 By considering a randomly selected hash function (or random permutation of the features), we are simulating the uncertainty of using a particular hash function on a never before encountered MDP. [sent-105, score-0.648]

41 3 Tug-of-War with Linear Value Function Approximation We now extend the tug-of-war sketch to the reinforcement learning setting by deﬁning the tug-of-war n ˜ ˜ hashing features as φ : S × A → Rm with φi (s, a) := j=1 I[h(j)=i] φj (s, a)ξ(j). [sent-107, score-0.98]

42 1 Value Function Approximation with Tug-of-War Hashing Intuitively, one might hope that the unbiasedness of the tug-of-war sketch for approximating inner products carries over to the case of linear value function approximation. [sent-111, score-0.204]

43 Our bound relies on interpreting tug-of-war hashing as a special kind of Johnson-Lindenstrauss transform [8]. [sent-114, score-0.69]

44 We deﬁne a ∞-universal family of hash functions H such that for any set of indices i1 , i2 , . [sent-115, score-0.345]

45 , n} → {−1, 1} be two independent hash functions chosen uniformly at random from ∞-universal families 1 and let H ∈ {0, ±1}m×n be a matrix with entries Hij = I[h(j)=i] ξ(j). [sent-136, score-0.324]

46 Lemma 1 states that, under certain conditions on the input vector x, tug-of-war hashing approximately preserves the norm of x. [sent-139, score-0.69]

47 A clear discussion on hashing as 2 a Johnson-Lindenstrauss transform can be found in the work of Kane and Nelson [13], who also improve Lemma 1 and extend it to the case where the family of hash functions is k-universal rather than ∞-universal. [sent-141, score-1.014]

48 , K}, xk · y − xk 2 y 2 ≤ Hxk · Hy ≤ xk · y + xk 2 y 2 . [sent-151, score-0.124]

49 Given hash functions h and ξ deﬁned as per Lemma 1, we denote by H ∈ Rm×n the matrix whose ˜ ˜ entries are Hij := I[h(j)=i] ξ(j), such that φ(s, a) = Hφ(s, a). [sent-158, score-0.324]

50 Under assumptions 1-4), gradient-descent SARSA(1) with 2 ˜ tug-of-war hashing converges to a unique θπ ∈ Rm and with probability at least 1 − δ ˜˜ Φθπ − Qπ µ ≤ Φθπ − Qπ µ θπ + 2 sup φ(s, a) s∈S,a∈A where Qπ is the exact solution to Equation 1 and θπ = arg minθ Φθ − Qπ 2 , µ. [sent-167, score-0.69]

51 First note that Theorem 1 implies the convergence of SARSA(1) with tug-of-war hashing to ˜ a unique solution, which we denote θπ . [sent-169, score-0.69]

52 A natural question now arises: does tug-of-war hashing lead to improved linear value function approximation compared with standard hashing? [sent-178, score-0.768]

53 More importantly, does tug-of-war hashing result in better learned policies? [sent-179, score-0.69]

54 4 Experimental Study In the sketch setting, the appeal of tug-of-war hashing over standard hashing lies in its unbiasedness. [sent-181, score-1.528]

55 We therefore begin with an empirical study of the magnitude of the bias when applying different hashing methods in a value function approximation setting. [sent-182, score-0.79]

56 1 Value Function Bias We used standard hashing, tug-of-war hashing, and no hashing to learn a value function over a short trajectory in the Mountain Car domain [22]. [sent-184, score-0.74]

57 0001 0 100 200 300 400 500 600 700 800 900 1000 0 Time Steps 100 200 300 400 500 600 700 800 900 1000 Time Steps Figure 1: Bias and Mean Squared Error of value estimates using standard and tug-of-war hashing in 1,000 learning steps of Mountain Car. [sent-194, score-0.767]

58 Parallel to the full-vector update we also updated both a tug-of-war weight ˜ ˆ vector θt and a standard hashing weight vector θt , with the same values of γ and α. [sent-201, score-0.712]

59 Both methods use a hash table size of m = 100 and the same randomly selected hash function. [sent-202, score-0.669]

60 This hash function is deﬁned as (ax + b) mod p mod m, where p is a large prime and a, b < p are random integers ˜ [5]. [sent-203, score-0.372]

61 At every step we compute the difference in value between the hashed value functions Qt (st , at ) ˆ t (st , at ), and the full-vector value function Qt (st , at ). [sent-204, score-0.138]

62 We repeated this experiment using 1 and Q million hash functions selected uniformly at random. [sent-205, score-0.324]

63 To provide a sense of scale, the estimate of the value of the ﬁnal state when using no hashing is approximately −4; note that the y-axis uses a logarithmic scale. [sent-207, score-0.718]

64 In comparison, the bias of standard hashing is orders of magnitude larger – almost as large as the value it is trying to estimate. [sent-209, score-0.784]

65 Our results conﬁrm the insights provided in Section 2: the tug-of-war value function can be signiﬁcantly less biased than the standard hashing value function. [sent-212, score-0.796]

66 2 Reinforcement Learning Performance Having smaller bias and mean square error in the Q-value estimates does not necessarily imply improved agent performance. [sent-214, score-0.153]

67 In reinforcement learning, actions are selected based on relative Qvalues, so a consistent bias may be harmless. [sent-215, score-0.185]

68 1 Tile Coding We ﬁrst studied the performance of agents using each of the two hashing methods in conjunction with tile coding. [sent-219, score-0.864]

69 We compared the two hashing methods using -greedy policies and the SARSA(λ) algorithm. [sent-222, score-0.69]

70 For each domain and each hashing method we performed a parameter sweep over the learning rate α and selected the best value which did not cause the value estimates to divergence. [sent-223, score-0.795]

71 We experimented with hash table sizes m ∈ [20, 1000] for Mountain Car and m ∈ [100, 2000] for Acrobot. [sent-231, score-0.345]

72 Each experiment consisted of 100 trials, sampling a new hash function for each trial. [sent-232, score-0.324]

73 Figure 2 shows the performance of standard hashing and tug-of-war hashing as a function of the hash table size. [sent-235, score-1.747]

74 The conclusion is clear: when the hashed vector is small relative to the full vector, tug-of-war hashing performs better than standard hashing. [sent-236, score-0.791]

75 2 Atari We next evaluated tug-of-war hashing and standard hashing on a suite of Atari 2600 games. [sent-240, score-1.426]

76 Atari games pose a variety of challenges for learning agents. [sent-242, score-0.167]

77 In the game-independent setting, agents are tuned using a small number of training games and subsequently evaluated over a large number of games for which no game-speciﬁc tuning takes place. [sent-244, score-0.4]

78 The game-independent setting forces us to use features that are common to all games, for example, by encoding the presence of color patterns in game screens; such an encoding is a form of exhaustive feature generation. [sent-245, score-0.228]

79 We based our evaluation on prior work on a suite of Atari 2600 games [2], to which we refer the reader for full details on handling Atari 2600 games as RL domains. [sent-247, score-0.383]

80 From a given game, we generate feature sets by exhaustively enumerating all single-color patterns of size 1x1 (single pixels), 2x2, and 3x3. [sent-250, score-0.117]

81 We trained -greedy SARSA(0) agents using both standard hashing and tug-of-war hashing with hash tables of size m=1,000, 5,000 and 20,000. [sent-254, score-1.792]

82 5 Tug-of-War Fraction of games Fraction of games Fraction of games Tug-of-War 0. [sent-270, score-0.501]

83 Accurately comparing methods across ﬁfty-ﬁve games poses a challenge, as each game exhibits a different reward function and game dynamics. [sent-292, score-0.408]

84 For each game, we extracted the average score achieved by our agents over the last 500 episodes of training, yielding six different scores (three per hashing method) per game. [sent-294, score-0.859]

85 0 indicates that the ith score was the highest for game g, and zg,i = 0. [sent-300, score-0.162]

86 For each combination of hashing method and memory size, its inter-algorithm score distribution shows the fraction of games for which the corresponding agent achieves a certain normalized score or better. [sent-302, score-1.098]

87 Figure 3 compares the score distributions of agents using either standard hashing or tug-of-war hashing for m = 1,000, 5,000 and 20,000. [sent-303, score-1.522]

88 Tug-of-war hashing consistently outperforms standard hashing across hash table sizes. [sent-304, score-1.747]

89 For m = 1,000, tug-of-war hashing performed statistically better in 38 games and worse in 5; for m = 5,000, it performed better in 41 games and worse in 7; and for m = 20,000 it performed better in 35 games and worse in 5. [sent-306, score-1.191]

90 Our results on Atari games conﬁrm what we observed on Mountain Car and Acrobot: in practice, tug-of-war hashing performs much better than standard hashing. [sent-307, score-0.879]

91 5 Conclusion In this paper, we cast the tug-of-war sketch into the reinforcement learning framework. [sent-310, score-0.267]

92 We showed that, although the tug-of-war sketch is unbiased in the setting for which it was developed [7], the self-referential component of reinforcement learning induces a small bias. [sent-311, score-0.288]

93 We showed that this bias can be much smaller than the bias that results from standard hashing and provided empirical results conﬁrming the superiority of tug-of-war hashing for value function approximation. [sent-312, score-1.518]

94 As increasingly more complex reinforcement learning problems arise and strain against the boundaries of practicality, so the need for fast and reliable approximation methods grows. [sent-313, score-0.169]

95 If standard hashing frees us from the curse of dimensionality, then tug-of-war hashing goes a step further by ensuring, when the demands of the task exceed available resources, a robust and principled shift from the exact solution to its approximation. [sent-314, score-1.446]

96 Acknowledgements ´ We would like to thank Bernardo Avila Pires, Martha White, Yasin Abbasi-Yadkori and Csaba Szepesv´ ri for the help they provided with the theoretical aspects of this paper, as well as Adam a White and Rich Sutton for insightful discussions on hashing and tile coding. [sent-315, score-0.798]

97 Using a controller based on reinforcement learning for a passive dynamic walking robot. [sent-392, score-0.141]

98 Generalization in reinforcement learning: Successful examples using sparse coarse coding. [sent-407, score-0.141]

99 Stochastic policy gradient reinforcement learning on a simple 3D biped. [sent-419, score-0.167]

100 Using imagery to simplify perceptual abstraction in reinforcement learning agents. [sent-426, score-0.141]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hashing', 0.69), ('hash', 0.324), ('sarsa', 0.278), ('atari', 0.215), ('qt', 0.172), ('games', 0.167), ('reinforcement', 0.141), ('st', 0.13), ('sketch', 0.126), ('game', 0.108), ('tile', 0.108), ('acrobot', 0.088), ('mountain', 0.083), ('agent', 0.082), ('agents', 0.066), ('alberta', 0.061), ('car', 0.06), ('score', 0.054), ('hashed', 0.054), ('bowling', 0.051), ('domains', 0.05), ('veness', 0.05), ('episodes', 0.049), ('maillard', 0.048), ('sutton', 0.047), ('eh', 0.046), ('graham', 0.044), ('bias', 0.044), ('collisions', 0.04), ('feature', 0.038), ('hv', 0.038), ('rm', 0.038), ('exhaustive', 0.037), ('stone', 0.036), ('mdp', 0.036), ('rl', 0.036), ('exhaustively', 0.035), ('rn', 0.033), ('avatar', 0.033), ('bellemare', 0.033), ('hxk', 0.033), ('kane', 0.033), ('richard', 0.032), ('michael', 0.032), ('aaai', 0.031), ('xk', 0.031), ('sturtevant', 0.029), ('minos', 0.029), ('unbiasedness', 0.029), ('soccer', 0.029), ('biased', 0.028), ('value', 0.028), ('robots', 0.028), ('approximation', 0.028), ('estimates', 0.027), ('lemma', 0.027), ('joel', 0.027), ('cormode', 0.027), ('irreducible', 0.027), ('aperiodic', 0.027), ('tsitsiklis', 0.027), ('vectors', 0.027), ('memory', 0.027), ('rt', 0.027), ('atomic', 0.026), ('coding', 0.026), ('policy', 0.026), ('silver', 0.025), ('manuela', 0.025), ('contingency', 0.025), ('lstd', 0.025), ('innovates', 0.025), ('reward', 0.025), ('full', 0.025), ('hu', 0.024), ('suite', 0.024), ('frees', 0.024), ('mod', 0.024), ('fraction', 0.024), ('induced', 0.024), ('generation', 0.023), ('hij', 0.023), ('sketches', 0.023), ('awareness', 0.023), ('platform', 0.023), ('features', 0.023), ('enumerating', 0.022), ('sweep', 0.022), ('projections', 0.022), ('patterns', 0.022), ('standard', 0.022), ('inner', 0.021), ('table', 0.021), ('unbiased', 0.021), ('universal', 0.021), ('indices', 0.021), ('demands', 0.02), ('barto', 0.02), ('benchmark', 0.02), ('il', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

2 0.35666597 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efﬁciently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression conﬁrm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

3 0.24714471 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1

4 0.23614731 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not veriﬁed by either theory or experiment because no methods have been proposed to ﬁnd a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which veriﬁes the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

5 0.20734984 148 nips-2012-Hamming Distance Metric Learning

Author: Mohammad Norouzi, David M. Blei, Ruslan Salakhutdinov

Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efﬁcient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a ﬂexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classiﬁcation results using no more than kNN on the binary codes. 1

6 0.20394634 126 nips-2012-FastEx: Hash Clustering with Exponential Families

7 0.18250093 329 nips-2012-Super-Bit Locality-Sensitive Hashing

8 0.11199249 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

9 0.088073879 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

10 0.087103166 348 nips-2012-Tractable Objectives for Robust Policy Optimization

11 0.082230508 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

12 0.079372227 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

13 0.074240521 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

14 0.072932623 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

15 0.07133656 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

16 0.069790415 38 nips-2012-Algorithms for Learning Markov Field Policies

17 0.064403139 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

18 0.061717037 293 nips-2012-Relax and Randomize : From Value to Algorithms

19 0.061465811 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

20 0.060548704 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, -0.113), (2, -0.03), (3, -0.057), (4, 0.029), (5, -0.042), (6, -0.007), (7, 0.098), (8, 0.178), (9, 0.115), (10, 0.112), (11, -0.182), (12, 0.069), (13, 0.261), (14, -0.288), (15, 0.088), (16, 0.12), (17, -0.066), (18, -0.298), (19, -0.062), (20, -0.0), (21, 0.16), (22, 0.05), (23, -0.157), (24, -0.048), (25, 0.019), (26, 0.051), (27, -0.059), (28, -0.027), (29, -0.021), (30, -0.002), (31, 0.041), (32, -0.044), (33, 0.07), (34, -0.032), (35, -0.006), (36, 0.006), (37, -0.063), (38, -0.053), (39, -0.046), (40, 0.021), (41, -0.02), (42, -0.023), (43, -0.006), (44, -0.015), (45, -0.046), (46, 0.015), (47, -0.009), (48, -0.026), (49, -0.12)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95037669 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

2 0.86018491 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

3 0.7929486 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

4 0.78717071 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

5 0.76714242 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve signiﬁcant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

6 0.58019507 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

7 0.50225061 148 nips-2012-Hamming Distance Metric Learning

8 0.37965405 126 nips-2012-FastEx: Hash Clustering with Exponential Families

9 0.25679454 202 nips-2012-Locally Uniform Comparison Image Descriptor

10 0.25610682 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

11 0.23558956 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

12 0.22860879 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

13 0.22722806 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

14 0.22703697 267 nips-2012-Perceptron Learning of SAT

15 0.22511593 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

16 0.22425054 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

17 0.22184786 38 nips-2012-Algorithms for Learning Markov Field Policies

18 0.22059926 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

19 0.21184084 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

20 0.21037216 155 nips-2012-Human memory search as a random walk in a semantic network

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (21, 0.025), (28, 0.011), (38, 0.106), (42, 0.026), (53, 0.293), (54, 0.049), (55, 0.023), (74, 0.072), (76, 0.115), (80, 0.071), (92, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84885216 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, deﬁned either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm deﬁned through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efﬁcient unlike other order based distance measures. 1

2 0.82213247 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very eﬃcient active learning algorithms for link classiﬁcation in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

3 0.81549883 359 nips-2012-Variational Inference for Crowdsourcing

Author: Qiang Liu, Jian Peng, Alex Ihler

Abstract: Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean ﬁeld (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we ﬁnd that the performance of the algorithms critically depends on the choice of a prior distribution on the workers’ reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. 1

same-paper 4 0.77507913 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

5 0.74836403 12 nips-2012-A Neural Autoregressive Topic Model

Author: Hugo Larochelle, Stanislas Lauly

Abstract: We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Speciﬁcally, we take inspiration from the conditional mean-ﬁeld recursive equations of the Replicated Softmax in order to deﬁne a neural network architecture that estimates the probability of observing a new word in a given document given the previously observed words. This paradigm also allows us to replace the expensive softmax distribution over words with a hierarchical distribution over paths in a binary tree of words. The end result is a model whose training complexity scales logarithmically with the vocabulary size instead of linearly as in the Replicated Softmax. Our experiments show that our model is competitive both as a generative model of documents and as a document representation learning algorithm. 1

6 0.69425714 30 nips-2012-Accuracy at the Top

7 0.69116676 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

8 0.58295214 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

9 0.58115405 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

10 0.57463402 160 nips-2012-Imitation Learning by Coaching

11 0.57280815 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

12 0.56946391 163 nips-2012-Isotropic Hashing

13 0.56927842 260 nips-2012-Online Sum-Product Computation Over Trees

14 0.56878412 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

15 0.56456017 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

16 0.56125468 298 nips-2012-Scalable Inference of Overlapping Communities

17 0.55564386 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

18 0.5548591 71 nips-2012-Co-Regularized Hashing for Multimodal Data

19 0.55449778 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family

20 0.55392647 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration