nips nips2002 nips2002-20 knowledge-graph by maker-knowledge-mining

20 nips-2002-Adaptive Caching by Refetching


Source: pdf

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ¢ £ Abstract We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. [sent-6, score-1.367]

2 We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. [sent-8, score-0.948]

3 A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem. [sent-9, score-0.221]

4 In file system caching, the secondary memory is a hard drive or a networked storage server while in web caching the secondary memory is the Internet. [sent-12, score-0.523]

5 The goal of caching is to keep within the smaller memory data objects (files, web pages, etc. [sent-13, score-0.362]

6 Since the future request stream is not generally known, heuristics, called caching policies, are used to decide which objects should be discarded as new objects are retained. [sent-15, score-0.582]

7 More precisely, if a requested object already resides in the cache then we call it a hit, corresponding to a low-latency data access. [sent-16, score-0.391]

8 Otherwise, we call it a miss, corresponding to a high-latency data access as the data must be fetched from the slower secondary memory into the faster cache memory. [sent-17, score-0.456]

9 In the case of a miss, room must be made in the cache memory for the new object. [sent-18, score-0.413]

10 To accomplish this a caching policy discards from the cache objects which it thinks will cause the fewest or least expensive future misses. [sent-19, score-1.025]

11 In this work we consider twelve baseline policies including seven common policies (RAND, FIFO, LIFO, LRU, MRU, LFU, and MFU), and five more recently developed and very successful policies (SIZE and GDS [CI97], GD* [JB00], GDSF and LFUDA [ACD 99]). [sent-20, score-0.809]

12 Thus the difficult question is: In a given situation, which policy should govern the cache? [sent-23, score-0.384]

13 For example, the request stream from disk accesses on a PC is quite different from the request stream produced by web-proxy accesses via a browser, or that of a file server on a local network. [sent-24, score-0.614]

14 The relative performance of the twelve policies vary greatly depending on the application. [sent-25, score-0.338]

15 Furthermore, the characteristics of a single request stream can vary temporally for a fixed application. [sent-26, score-0.243]

16 For example, a file server can behave quite differently during the middle of the night while making tape archives in order to backup data, whereas during the day its purpose is to serve file requests to and from other machines and/or users. [sent-27, score-0.19]

17 Because of their differing decision criteria, different policies perform better given different workload characteristics. [sent-28, score-0.275]

18 The request streams become even more difficult to characterize when there is a hierarchy or a network of caches handling a variety of file-type requests. [sent-29, score-0.354]

19 In these cases, choosing a fixed policy for each cache in advance is doomed to be sub-optimal. [sent-30, score-0.722]

20 lru fifo mru lifo size lfu mfu rand gds gdsf lfuda gd 0. [sent-31, score-0.844]

21 1 0 0 205000 210000 215000 220000 225000 230000 235000 205000 210000 215000 220000 225000 230000 (a) (b) Lowest miss rate policy switches between SIZE, GDS, GDSF, and GD* Lowest miss rate policy . [sent-47, score-1.506]

22 SIZE, GDS, GDSF, and GD* size gds gdsf gd 205000 210000 215000 220000 (c) 225000 230000 235000 205000 210000 215000 220000 225000 230000 (d) Figure 1: Miss rates ( axis)of a) the twelve fixed policies (calculated w. [sent-50, score-0.619]

23 a window of 300 requests) over 30,000 requests ( axis), b) the same policies on a random permutation of the data set, c) and d) the policies with the lowest miss rates in the figures above. [sent-53, score-1.012]

24 ¡ The usual answer to the question of which policy to employ is either to select one that works well on average, or to select one that provides the best performance on some past workload that is believed to be representative. [sent-54, score-0.508]

25 First, the selection (and perhaps tuning) of the single policy to be used in any given situation is done by hand and may be both difficult and error-prone, especially in complex system architectures with unknown and/or time-varying workloads. [sent-56, score-0.361]

26 And second, the performance of the chosen policy with the best expected average case performance may in fact be worse than that achievable by another policy at any particular moment. [sent-57, score-0.797]

27 Figure 1 (a) shows the hit rate of the twelve policies described above on a representative portion of one of our data sets (described below in Section 3) and Figure 1 (b) shows the hit rate of the same policies on a random permutation of the request stream. [sent-58, score-0.883]

28 As can be clearly be seen, the miss rates on the permuted data set are quite different from those of the original data set, and it is this difference that our algorithms aim to exploit. [sent-59, score-0.361]

29 Figures 1 (c) and (d) show which policy is best at each instant of time for the data segment and the permuted data segment. [sent-60, score-0.413]

30 It is clear from these (representative) figures that the best policy changes over time. [sent-61, score-0.388]

31 To avoid the perils associated with trying to hand-pick a single policy, one would like to be able to automatically and dynamically select the best policy for any given situation. [sent-62, score-0.417]

32 In other words, one wants a cache replacement policy which is “adaptive”. [sent-63, score-0.722]

33 BestFixed is the a posteriori selected WWk, BestShifting(K) policy with the lowest miss rate on the BF=SIZE entire request stream for our twelve policies. [sent-70, score-1.105]

34 BestShifting( ) considers Best Fixed = SIZE all possible partitions of the request BestShift(K) All Virtual Caches stream into at most segments along with the best policy for each segment. [sent-71, score-0.655]

35 BestShifting( ) chooses the partition with the lowest total miss rate over the entire dataset and can be computed All VC in time using dynamic programming. [sent-72, score-0.41]

36 Notice that BestFixed BestShifting( ), and that most of the advantage of shifting policies occurs with relatively few shifts ( shifts in roughly 300,000 requests). [sent-77, score-0.342]

37 We show that with no additional fetches, the master policy works about as well as BestFixed. [sent-84, score-0.568]

38 We define a refetch as a fetch of a previously seen object that was favored by the current policy but discarded from the real cache by a previously active policy. [sent-85, score-0.872]

39 In particular, when all required objects are refetched instantly, this policy has a 13-20% lower miss rate than BestFixed, and almost the same performance as BestShifting( ) for modest . [sent-87, score-0.844]

40 For reference, when compared with LRU, this policy has a 49-63% lower miss rate. [sent-88, score-0.696]

41 Disregarding misses on objects never seen before (compulsory misses), the performance improvements are even greater. [sent-89, score-0.201]

42 A more detailed discussion of our results is given in Section 3 2 The Master Policy We seek to develop an on-line master policy that determines which of a set of baseline policies should govern the real cache at any time. [sent-92, score-1.266]

43 A virtual cache simulates the operation of a single baseline policy. [sent-95, score-0.58]

44 Each virtual cache records a few bytes of metadata about each object in its cache: ID, size, and calculated priority. [sent-96, score-0.561]

45 Object data is only kept in the real cache, making the cost of maintain- Figure 3: Virtual caches embedded in the cache memory. [sent-97, score-0.585]

46 Via the virtual caches, the master policy can observe the miss rates of each policy on the actual request stream in order to determine their performance on the current workload. [sent-99, score-1.727]

47 To be fair, virtual caches reside in the memory space which could have been used to cache real objects, as is illustrated in Figure 3. [sent-100, score-0.807]

48 Thus, the space used by the real cache is reduced by the space occupied by the virtual caches. [sent-101, score-0.573]

49 We set the virtual size of each virtual cache equal to the size of the full cache. [sent-102, score-0.759]

50 The caches used for computing the comparators BestFixed and BestShifting( ) are based on caches of the full size. [sent-103, score-0.429]

51 A simple heuristic the master policy can use to choose which caching policy should control at any given time is to continuously monitor the number of misses incurred by each policy in a past window of, for example, 300 requests (depicted in Figure 1 (a)). [sent-104, score-1.842]

52 The master policy then gives control of the real cache to the policy with the least misses in this window (shown in Figure 1 (c)). [sent-105, score-1.485]

53 While this works well in practice, maintaining such a window for many fixed policies is expensive, further reducing the space for the real cache. [sent-106, score-0.3]

54 A better master policy keeps just one weight for each policy (non-negative and summing to one) which represents an estimate of its current relative performance. [sent-108, score-0.955]

55 The master policy is always governed by the policy with the maximum weight2 . [sent-109, score-0.929]

56 This technique is preferred to the window-based master policy because it uses much less memory, and because the parameters of the weight updates are easier to tune than the window size. [sent-112, score-0.653]

57 This also makes the resulting master policy more robust (not shown). [sent-113, score-0.568]

58 First, the weights of all policies that missed the new request are multiplied by a factor and then renormalized. [sent-116, score-0.395]

59 Since the weights are renormalized, they remain unchanged if all policies miss the new request. [sent-118, score-0.558]

60 © $  ¢)  ¦  © ©©§¥£ ¢ ¤  ¦ ¨¨¨¦¤ 1 As an additional optimization, we record the id and size of each object only once, regardless of the number of virtual caches it appears in. [sent-120, score-0.411]

61 2 This can be sub-optimal in the worst case since it is always possible to construct a data stream where two policies switch back and forth after each request. [sent-121, score-0.323]

62 However, real request streams appear to be divided into segments that favor one of the twelve policies for a substantial number of requests (see Figure 1). [sent-122, score-0.686]

63 Figure 1(a) shows the current absolute performance of the policies in a rolling window ( ), whereas Figure 4 depicts relative performance and shows how the policies compete over time. [sent-124, score-0.529]

64 (Recall that the policy with the highest weight always controls the real cache). [sent-125, score-0.429]

65 ) 0) ¢ £¡   FSUP Weight There are a number of share upWeight History for Individual Policies dates [HW98, BW02] with various 1 lru fifo recovery properties. [sent-126, score-0.273]

66 4 bounds proven in the expert framegdsf lfuda work for the combined loss and share gd 0. [sent-131, score-0.138]

67 ¦    ¡ ¨©§   ¤ ¡ ¤ § ¦ ¤ §  ¡ ¡ ©¨¦  ¡ ¥§      ¤  ¡ ©¨§   Formally, for each trial , the loss update is miss for ¦ § ¦ ©©¦ $ ¨¨¨ miss ¤ ¤ where is a parameter in and miss is 1 if the -th object is missed by policy and 0 otherwise. [sent-137, score-1.419]

68 A small parameter causes high weight to decay quickly if its corresponding policy starts incurring more misses than other policies with high weights. [sent-142, score-0.728]

69 The higher the the more quickly past good policies will recover. [sent-143, score-0.267]

70 Instantaneous Rollover When space is needed to cache a new request, the master policy discards objects not present in the governing policy’s virtual cache3 . [sent-147, score-1.239]

71 This causes the content of the real cache to “roll over” to the content of the current governing virtual cache. [sent-148, score-0.687]

72 We call this demand rollover because objects in the governing virtual cache are refetched into the real cache on demand. [sent-149, score-1.282]

73 While this master policy works almost as well as BestFixed, we were not satisfied and wanted to do as well as BestShifting( ) (for a reasonably large bound on the number of segments). [sent-150, score-0.568]

74 We noticed that the content of the real cache lagged behind the content of the governing virtual cache and had more misses, and conjectured that ”quicker” rollover strategies would improve overall performance. [sent-151, score-1.204]

75 the objects in the new governing virtual cache that were not retained in the real cache. [sent-153, score-0.69]

76 By appropriate tuning of the update parameters and the number of instantaneous rollovers can be kept reasonably small and the miss rates of our master policy are almost as good as BestShifting( ) for much larger than the actual number of shifts used on-line. [sent-155, score-1.062]

77 While this makes sense for defining a comparator, we now give more realistic rollover strategies that reduce the lag time. [sent-157, score-0.156]

78 3 Background Rollover Because instantaneous rollover immediately refetches everything in the governing virtual cache that is not already in the real cache, it may cause a large number of refetches even when the number of policy switches is kept small. [sent-159, score-1.407]

79 If all refetches are counted as misses, then the miss rate of such a master policy is comparable to that of BestFixed. [sent-160, score-1.02]

80 However, from a user perspective, refetching is advantageous because of the latency advantage gained by having required objects in memory before they are needed. [sent-162, score-0.176]

81 To characterize the performance of background rollover without addressing these architectural details, the following background refetching strategies were examined: 1 refetch for every cache miss; 1 for every hit; 1 for every request; 2 for every request; 1 for every hit and 5 for every miss, etc. [sent-166, score-0.828]

82 Each background technique gave fewer misses than BestFixed, approaching and nearly matching the performance obtained by the master policy using instantaneous rollover. [sent-167, score-0.825]

83 Of course, techniques which reduce the number of policy switches (by tuning and ) also reduce the number of refetches. [sent-168, score-0.397]

84 Figure 5 compares the performance of each master policy with that of BestFixed and shows that the three master policies almost always outperform BestFixed. [sent-169, score-1.022]

85 Deviations from the baseline show how the performance of baseline our on-line shifting policies differ in miss rate. [sent-179, score-0.713]

86 ¥© ¥ ¦ ¡ §¥ ¨¦ ¤ ¡ ¡ 3 Data and Results Figure 6 shows how the master policy with instantaneous rollover (labeled ’roll’) “tracks” the baseline policy with the lowest miss rate over the representative data segment used in previous figures. [sent-181, score-1.611]

87 Figure 7 shows the performance of our master policies with respect to BestFixed, BestShifting( ), and LRU. [sent-182, score-0.454]

88 It shows that demand rollover does slightly worse than BestFixed, while background 1 (1 refetch every request) and background 2 (1 refetch   every hit and 5 every miss) do better than BestFixed and almost as well as instantaneous, which itself does almost as well as BestShifting. [sent-183, score-0.505]

89 All of the policies do significantly better than LRU. [sent-184, score-0.223]

90 Discounting the compulsory misses, our best policies have 1/3 fewer “real” misses than BestFixed and 1/2 the “real” misses of LRU. [sent-185, score-0.538]

91 The traces we used represent a variety of workloads including a personal workstation (Work-Week), a single user (User-Month), and a remote storage system with a large number of clients, filtered by LRU on the clients’ local caches (Server-Month-LRU). [sent-188, score-0.234]

92 For each data set, the table shows the number of requests, % of requests skipped (size cache size), number of compulsory misses of objects not previously seen, and the number of rollovers. [sent-189, score-0.75]

93 For each policy (including BestShifting( )), the table shows miss rate, and % improvement over BestFixed (labeled ’ BF’) and LRU. [sent-190, score-0.696]

94 In each case all 12 virtual caches consumed on average less than 2% of the real cache space. [sent-191, score-0.755]

95 ¡ £ ¤¢   ' 0) ¨ )  ¡ 52 7 ¢ ¨ ¡ Miss Rates under FSUP with Master lru fifo mru lifo size lfu mfu rand gds gdsf lfuda gd roll Miss Rates 0. [sent-194, score-0.875]

96 Figure 7: Online shifting policies against offline comparators and LRU for Work-Week dataset. [sent-263, score-0.321]

97 In this paper we use the weight updates of these algorithms for dynamically determining the best caching policy. [sent-268, score-0.327]

98 This application is more elaborate because we needed to actively gather performance information about the caching policies via virtual caches. [sent-269, score-0.638]

99 In future work we plan to do a more thorough study of feasibility of background rollover by building actual systems. [sent-270, score-0.204]

100 Greedydual* web caching algorithm: Exploiting the two sources of temporal locality in web request streams. [sent-320, score-0.453]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('policy', 0.361), ('cache', 0.361), ('miss', 0.335), ('policies', 0.223), ('caching', 0.221), ('lru', 0.221), ('bestshifting', 0.208), ('master', 0.207), ('bestfixed', 0.195), ('caches', 0.182), ('request', 0.172), ('virtual', 0.17), ('rollover', 0.156), ('requests', 0.134), ('misses', 0.118), ('bestf', 0.104), ('gds', 0.104), ('gdsf', 0.091), ('twelve', 0.091), ('refetch', 0.078), ('refetches', 0.078), ('stream', 0.071), ('instantaneous', 0.067), ('comparators', 0.065), ('refetching', 0.065), ('objects', 0.059), ('governing', 0.058), ('server', 0.056), ('gd', 0.055), ('compulsory', 0.052), ('fifo', 0.052), ('lfuda', 0.052), ('lifo', 0.052), ('mfu', 0.052), ('mru', 0.052), ('workload', 0.052), ('memory', 0.052), ('demand', 0.049), ('baseline', 0.049), ('background', 0.048), ('hit', 0.048), ('rand', 0.045), ('past', 0.044), ('warmuth', 0.043), ('secondary', 0.043), ('shifts', 0.043), ('real', 0.042), ('cbfh', 0.039), ('fsup', 0.039), ('lfu', 0.039), ('missrates', 0.039), ('rate', 0.039), ('le', 0.036), ('switches', 0.036), ('lowest', 0.036), ('window', 0.035), ('experts', 0.035), ('comparator', 0.034), ('helmbold', 0.034), ('herbster', 0.034), ('shifting', 0.033), ('ari', 0.031), ('roll', 0.031), ('expert', 0.031), ('object', 0.03), ('web', 0.03), ('switch', 0.029), ('scott', 0.029), ('dynamically', 0.029), ('size', 0.029), ('content', 0.028), ('bf', 0.027), ('best', 0.027), ('rates', 0.026), ('acd', 0.026), ('ahmed', 0.026), ('amer', 0.026), ('backgrnd', 0.026), ('bestshift', 0.026), ('brandt', 0.026), ('clients', 0.026), ('ethan', 0.026), ('gramacy', 0.026), ('refetched', 0.026), ('skipped', 0.026), ('workloads', 0.026), ('wwk', 0.026), ('weight', 0.026), ('disk', 0.026), ('storage', 0.026), ('bousquet', 0.025), ('instant', 0.025), ('performance', 0.024), ('segments', 0.024), ('updates', 0.024), ('update', 0.023), ('aag', 0.023), ('accesses', 0.023), ('discards', 0.023), ('govern', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 20 nips-2002-Adaptive Caching by Refetching

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

2 0.27720809 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

3 0.17727794 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Author: Christopher G. Atkeson, Jun Morimoto

Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.

4 0.13970491 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell

Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.

5 0.1358688 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

6 0.12206012 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

7 0.12131394 134 nips-2002-Learning to Take Concurrent Actions

8 0.089003153 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions

9 0.08729165 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

10 0.078674562 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

11 0.068652876 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

12 0.06850443 165 nips-2002-Ranking with Large Margin Principle: Two Approaches

13 0.066604428 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

14 0.065543242 205 nips-2002-Value-Directed Compression of POMDPs

15 0.045686752 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search

16 0.040558506 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

17 0.037924968 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

18 0.036991853 78 nips-2002-Efficient Learning Equilibrium

19 0.03636647 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

20 0.032285914 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.109), (1, -0.009), (2, -0.302), (3, -0.08), (4, 0.019), (5, -0.112), (6, 0.1), (7, -0.099), (8, 0.061), (9, 0.036), (10, -0.053), (11, -0.01), (12, -0.083), (13, 0.025), (14, 0.033), (15, -0.035), (16, 0.033), (17, -0.106), (18, 0.166), (19, 0.032), (20, -0.107), (21, -0.064), (22, -0.136), (23, -0.086), (24, -0.122), (25, -0.043), (26, 0.039), (27, 0.096), (28, -0.135), (29, 0.026), (30, 0.014), (31, 0.013), (32, 0.02), (33, -0.038), (34, 0.006), (35, -0.038), (36, -0.075), (37, -0.029), (38, -0.069), (39, -0.026), (40, 0.004), (41, 0.091), (42, -0.062), (43, -0.021), (44, -0.024), (45, 0.033), (46, 0.013), (47, -0.005), (48, 0.068), (49, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97656661 20 nips-2002-Adaptive Caching by Refetching

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

2 0.79315931 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

3 0.72608215 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

4 0.68873829 134 nips-2002-Learning to Take Concurrent Actions

Author: Khashayar Rohanimanesh, Sridhar Mahadevan

Abstract: We investigate a general semi-Markov Decision Process (SMDP) framework for modeling concurrent decision making, where agents learn optimal plans over concurrent temporally extended actions. We introduce three types of parallel termination schemes – all, any and continue – and theoretically and experimentally compare them. 1

5 0.6471082 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell

Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.

6 0.58209246 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

7 0.57264155 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions

8 0.42957553 205 nips-2002-Value-Directed Compression of POMDPs

9 0.3827146 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

10 0.32254404 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

11 0.28427941 165 nips-2002-Ranking with Large Margin Principle: Two Approaches

12 0.22919992 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search

13 0.2139249 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

14 0.20350657 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

15 0.19791332 185 nips-2002-Speeding up the Parti-Game Algorithm

16 0.19516878 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

17 0.18354118 107 nips-2002-Identity Uncertainty and Citation Matching

18 0.17852661 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs

19 0.16979702 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm

20 0.16577145 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.369), (11, 0.013), (23, 0.048), (40, 0.05), (42, 0.05), (49, 0.018), (54, 0.073), (55, 0.022), (57, 0.017), (67, 0.01), (68, 0.017), (74, 0.066), (87, 0.019), (92, 0.026), (98, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7953195 20 nips-2002-Adaptive Caching by Refetching

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

2 0.54379505 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition

Author: Shantanu Chakrabartty, Gert Cauwenberghs

Abstract: Forward decoding kernel machines (FDKM) combine large-margin classifiers with hidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeled data. Training over very large data sets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an on-line stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177 ,080 samples of the TlMIT database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model.

3 0.36960158 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text

Author: Brochu Eric, Nando de Freitas

Abstract: We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.

4 0.3577981 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering

Author: Chakra Chennubhotla, Allan D. Jepson

Abstract: Using a Markov chain perspective of spectral clustering we present an algorithm to automatically find the number of stable clusters in a dataset. The Markov chain’s behaviour is characterized by the spectral properties of the matrix of transition probabilities, from which we derive eigenflows along with their halflives. An eigenflow describes the flow of probability mass due to the Markov chain, and it is characterized by its eigenvalue, or equivalently, by the halflife of its decay as the Markov chain is iterated. A ideal stable cluster is one with zero eigenflow and infinite half-life. The key insight in this paper is that bottlenecks between weakly coupled clusters can be identified by computing the sensitivity of the eigenflow’s halflife to variations in the edge weights. We propose a novel E IGEN C UTS algorithm to perform clustering that removes these identified bottlenecks in an iterative fashion.

5 0.34803081 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

6 0.34450403 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

7 0.34290111 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

8 0.34270921 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

9 0.34183192 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

10 0.34182471 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

11 0.34069264 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

12 0.34038329 169 nips-2002-Real-Time Particle Filters

13 0.33941934 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories

14 0.33934334 135 nips-2002-Learning with Multiple Labels

15 0.33917102 124 nips-2002-Learning Graphical Models with Mercer Kernels

16 0.3389636 10 nips-2002-A Model for Learning Variance Components of Natural Images

17 0.3389194 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

18 0.33876413 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

19 0.33863056 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

20 0.33820164 163 nips-2002-Prediction and Semantic Association