nips nips2001 nips2001-186 knowledge-graph by maker-knowledge-mining

186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning


Source: pdf

Author: Mikio L. Braun, Joachim M. Buhmann

Abstract: We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. This procedure requires identifying the exact relationship between permutations and tours. In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 III, University of Bonn R6merstraBe 164, 53117 Bonn, Germany Abstract We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. [sent-8, score-0.435]

2 Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. [sent-9, score-0.203]

3 This procedure requires identifying the exact relationship between permutations and tours. [sent-10, score-0.258]

4 In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. [sent-11, score-0.639]

5 Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. [sent-12, score-0.667]

6 1 Introduction The approach in combinatorial optimization is traditionally single-instance and worst-case-oriented. [sent-13, score-0.136]

7 This constitutes a completely different problem: given a set of similar instances, construct solutions which are good on average. [sent-16, score-0.176]

8 Since the instances share some information, it might be expected that this problem is simpler than solving all instances separately, even for NP-hard problems. [sent-18, score-0.332]

9 We will study the following example of a multiple-instance average-case problem, which is built from the Euclidean travelings salesman problem (TSP) in the plane. [sent-19, score-0.183]

10 At the beginning of each week, the salesman has a new set of appointments for the week, for which he has to plan the shortest round-trip. [sent-21, score-0.412]

11 The location of the appointments will not be completely random, because there are certain areas which have a higher probability of containing an appointment, for example cities or business districts within cities. [sent-22, score-0.267]

12 Instead of solving the planning problem each week from scratch, a clever salesman will try to exploit the underlying density and have a rough trip pre-planned, which he will only adapt from week to week. [sent-23, score-0.437]

13 Then, the locations of the appointments for each week are given as samples from the normally distributed random vectors (i E {1, . [sent-30, score-0.358]

14 ,Xn ) will be called a scenario, sampled appointment locations (sampled) instance. [sent-36, score-0.149]

15 The task consists in finding the permutation 7r E Sn which minimizes 7r I-t d7r (n)1f(l) + L~:ll d1f (i)1f(iH) , where dij := IIXi - Xj112' and Sn being the set of all bijective functions on the set {1, . [sent-37, score-0.181]

16 ,In as input and outputs a number of solutions Sl,···, Sn· It is then measured by the average performance (l/n) L~=l C(Sk, h), where C(s , I) denotes the cost of solution s on instance I. [sent-47, score-0.329]

17 ,In, the algorithm has to construct a solution s' on a newly sampled instance I'. [sent-52, score-0.152]

18 ,In are then the training data, E(C(s', I')) is the analogue of the expected risk or cost, and the set of solutions is identified with the hypothesis class in learning theory. [sent-58, score-0.241]

19 From this training instance, an average solution is constructed, represented by a closed curve in the plane. [sent-60, score-0.138]

20 This average trajectory is supposed to capture the essential structure of the underlying probability density, similar to the centroids in K-means clustering. [sent-61, score-0.406]

21 Then, the average trajectory is used as a seed for a simple heuristic which constructs solutions on newly drawn instances. [sent-62, score-0.543]

22 The average trajectories are computed by geometrically averaging tours which are drawn by a Gibbs sampler at finite temperature. [sent-63, score-0.715]

23 It turns out that the temperature acts as a scale or smoothing parameter. [sent-65, score-0.166]

24 It is based on a completely different algorithmic approach using Gibbs sampling and a general technique for averaging tours. [sent-68, score-0.092]

25 Furthermore, the goal is not to provide a heuristic for computing the best solution, but to extract the relevant statistics of the Gibbs distribution at finite temperatures to generate the average trajectory, which will be used to compute solutions on future instances. [sent-70, score-0.479]

26 Let M be a finite set and f: M -+ lit The Gibbs distribution at temperature T E Il4 is given by (m E M) 9T(m) := exp( - f(m)/T~ . [sent-73, score-0.204]

27 For TSP, M = Sn and ¢ is the Lin-Kernighan two-change [4], which consists in choosing two indexes i, j at random and reversing the path between the appointments i and j. [sent-79, score-0.229]

28 3 Averaging Tours Our goal is to compute the average trajectory, which should grasp the underlying structure common to all instances, with respect to the Gibbs measure at non-zero temperature T . [sent-81, score-0.369]

29 The Metropolis algorithm produces a sequence of permutations 7rl, 7r2, . [sent-82, score-0.194]

30 Since permutations cannot be added, we cannot simply compute the empirical means of 7rn . [sent-88, score-0.222]

31 Definition 1 (trajectory) The trajectory of 7r E Sn given n points Xl, . [sent-90, score-0.203]

32 The set of all trajectories (for all sets of n points) is denoted by Tn (this is the set of all mappings T {I , . [sent-97, score-0.267]

33 Addition of trajectories and multiplication with scalars can be defined pointwise. [sent-101, score-0.374]

34 Unfortunately, this does not yield the desired results , since the relation between permutations and tours is not one-to-one. [sent-103, score-0.405]

35 For example, the permutation obtained by starting the tour at a different city still corresponds to the same tour . [sent-104, score-0.678]

36 We therefore need to define the addition of trajectories in a way which is independent of the choice of permutation (and therefore trajectory) to represent the tour. [sent-105, score-0.479]

37 We will study the relationship between tours and permutations first in some detail, since we feel that the concepts introduced here might be generally useful for analyzing combinatorial optimization problems. [sent-106, score-0.634]

38 A subset tEE is called a tour iff It I = n, for every v E V, there exist exactly two el, e2 E t such that v E el and v E e2, and (V, t) is connected. [sent-111, score-0.308]

39 Given a symmetric matrix (d ij ) of distances, the length of a tour t is defined by C(t) := L:{i,j} Et d ij . [sent-112, score-0.402]

40 The tour corresponding to a permutation 7r E Sn is given by n-l t(7r) :={ {7r(I), 7r(n)}} U {{7r(i) ,7r(i + I)}}. [sent-113, score-0.403]

41 U (4) i=l If t(7r) = t for a permutation 7r and a tour t, we say that 7r represents t. [sent-114, score-0.403]

42 We call two permutations 7r, 7r' equivalent, if they represent the same tour and write 7r ,. [sent-115, score-0.469]

43 Note that the length of a permutation is fully determined by its equivalence class. [sent-121, score-0.243]

44 We have to define the addition EB of trajectories such that the sum is independent of the representation. [sent-127, score-0.351]

45 This means that for two tours h, t2 such that h is represented by 'lf1, 'If~ and t2 by 'lf2, 'If~ it holds that f('lf1) EB f('lf2) ~ f('lfD EB f('If~). [sent-128, score-0.265]

46 We define a group action of Sn on itself by right translation ('If, 9 E Sn): " . [sent-131, score-0.2]

47 (5) Note that any permutation in Sn can be mapped to another by an appropriate group action (namely 'If -+ 'If' by ('If,-l'lf) . [sent-134, score-0.272]

48 ), such that the group action of Sn on itself suffices to study the equivalence classes of ~. [sent-136, score-0.246]

49 H t will be called the symmetry group of TSP(Sn) and it follows that ['If] = H t · 'If :={h · 'If I hE Hd. [sent-145, score-0.098]

50 (7) The fundamental result is Theorem 1 Let t be the mapping which sends permutations to tours as defined in (4). [sent-167, score-0.48]

51 Then, H t = H , where H t is the set of all t-invariant permutations and H is defined in (7). [sent-168, score-0.269]

52 We are going to prove that t-invariant permutations are completely defined by their values on 1 and 2. [sent-171, score-0.334]

53 ,n } , u i(k) = uk(i) and therefore, h= { u k- 1 (2un-k if h(i) = ui-1 (k) ifh(i)=u- i+1(k)' D Adding trajectories We can now define equivalence for trajectories. [sent-181, score-0.386]

54 First define a group action of Sn on Tn analogously to (5): the action of h E H t on "( E Tn is given by h · "( := "( 0 h- 1 . [sent-182, score-0.288]

55 (8) Before adding two trajectories we will first choose equivalent representations "(', 1}' which minimize d( "(' , 1}'). [sent-188, score-0.267]

56 Because of the results presented so far, searching through all equivalent trajectories is computationally tractable. [sent-189, score-0.267]

57 It follows that it suffices to change the representations only for one argument, since d(h· ,,(, i· rJ) = db, h - 1 i· rJ)· So the time complexity of one addition reduces to 2n computation of distances which involve n subtractions each. [sent-193, score-0.136]

58 The normalizing action is defined by b, rJ E Tn) (9) n , 1J := argmin d( ,,(, n . [sent-194, score-0.331]

59 rJ)· n EH t Assuming that the normalizing action is unique 1 , we can prove Theorem 2 Let ,,(, rJ be two trajectories, and n , 1J the unique normalizing action as defined in (9). [sent-195, score-0.458]

60 We claim that n ,I1J1 The normalizing action is defined by n,I1J1 = gn' 1Jh-1. [sent-200, score-0.253]

61 ,,(, nih· rJ) = argmin db , g-l n lh· rJ), n l EHt n l EH t n l EH t (11) by inserting g-l parallelly before both arguments in the last step. [sent-203, score-0.126]

62 Since the normalizing action is unique, it follows that for the n l realizing the minimum it holds that g-ln l h = n , 1J and therefore n l = n , I1J1 = gn' 1Jh-1. [sent-204, score-0.232]

63 0 The sum of more than two trajectories can be defined by normalizing everything with respect to the first summand, so that empirical sums EB~=l f(? [sent-206, score-0.46]

64 t 4 Inferring Solutions on New Instances We transfer a trajectory to a new set of appointments Xl, . [sent-208, score-0.432]

65 ,X n by computing the relaxed tour using the following finite-horizon adaption technique: First of all, passing times ti for all appointments are computed. [sent-211, score-0.66]

66 We extend the domain of a trajectory "( from {I, . [sent-212, score-0.203]

67 Then we define ti such that "((ti) is the earliest point with minimal distance between appointment Xi and the trajectory. [sent-216, score-0.188]

68 The permutation which sorts (ti)~l is the relaxed solution of"( to (Xi) . [sent-218, score-0.235]

69 r(i + w + 2) (index addition is modulo n) is replaced by the best alternative through the appointments ? [sent-225, score-0.257]

70 lOtherwise, perturb the locations of the appointments by infinitesimal changes. [sent-237, score-0.261]

71 Average trajectories for different temperatures are plotted in figures l(a)- (c). [sent-243, score-0.431]

72 As the temperature decreases, the average trajectory converges to the trajectory of a single locally optimal tour. [sent-244, score-0.68]

73 The graphs demonstrate that the temperature T acts as a smoothing parameter. [sent-245, score-0.166]

74 To estimate the expected risk of an average trajectory, the post-processed relaxed (PPR) solutions were averaged over 100 new instances (see figure l(d)-(g)) in order to estimate the expected costs. [sent-246, score-0.507]

75 The costs of the best solutions are good approximations, within 5% of the average minimum as determined by careful simulated annealing. [sent-247, score-0.288]

76 In other words, average trajectories computed at temperatures which are too low, start to overfit to noise present only in the instance for which they were computed. [sent-251, score-0.56]

77 So computation of the global optimum of a noisy combinatorial optimization problem might not be the right strategy, because the solutions might not reflect the underlying structure. [sent-252, score-0.432]

78 Averaging over many suboptimal solutions provides much better statistics. [sent-253, score-0.138]

79 The problem is nevertheless suited for the application of the heuristic provided by the empirical risk approximation (ERA) framework [1], which will be briefly sketched here. [sent-256, score-0.123]

80 Selecting a subset of solutions such that £l -spheres of radius "( cover M results in the coarse-grained hypothesis set M,. [sent-259, score-0.174]

81 ERA then advocates to either sample from the ,,(-sphere around the empirical minimizer or average over these solutions. [sent-269, score-0.136]

82 Now it is well known, that the Gibbs sampler is concentrated on solutions whose costs are below a certain threshold. [sent-270, score-0.217]

83 Now interpreting "( as energy, we can compute the stop temperature from the optimal T Using the well-known relation from statistical physics ~ee:t:~:: = T - 1 , we can derive a lower bound on the optimal temperature depending on variance estimates of the specific scenario given. [sent-274, score-0.332]

84 The underlying structure of similar instances should be extracted and used in order reduce the computational complexity for computing solutions to related instances. [sent-277, score-0.422]

85 Starting with the noisy Euclidean TSP, the construction of average tours is studied in this paper, which involves determining the exact relationship between permutation and tours, and identifying the intrinsic symmetries of the TSP. [sent-278, score-0.595]

86 We hope that this technique might prove to be useful for other applications in the field of averaging over solutions of combinatorial problems. [sent-279, score-0.339]

87 The average trajectories are able to capture the underlying structure common to all instances. [sent-280, score-0.47]

88 A heuristic for constructing solutions on new instances is proposed. [sent-281, score-0.353]

89 This phenomenon points at a deep connection between combinatorial optimization problems with noise and learning theory, which might be bidirectional. [sent-284, score-0.164]

90 On the one hand, we believe that noisy (in contrast to random) combinatorial optimization problems are dominant in reality. [sent-285, score-0.178]

91 Robust algorithms could be built by first estimating the undistorted structure and then using this structure as a guideline for constructing solutions for single instances. [sent-286, score-0.208]

92 An analogue approach to the travelling salesman problem using an elastic net method. [sent-299, score-0.281]

93 An effective heuristic algorithm for the traveling salesman problem. [sent-311, score-0.304]

94 58 33a1 ~ g Figure 1: (a) Average trajectories at different temperatures for n = 100 appointments on a circle with a 2 = 0. [sent-356, score-0.657]

95 (b) Average trajectories at different temperatures, for multiple Gaussian sources, n = 50 and a 2 = 0. [sent-358, score-0.267]

96 (c) The same for an instance with structure on two levels. [sent-360, score-0.088]

97 (d) Average tour length of the post-processed relaxed (PPR) solutions for the circle instance plotted in (a). [sent-361, score-0.656]

98 The average fits to noise in the data if the temperature is too low, leading to overfitting phenomena. [sent-363, score-0.332]

99 (e) The average trajectory with the smallest average length of its PPR solutions in (d). [sent-366, score-0.609]

100 (g) Lowest temperature trajectory with small average PPR solution length in (f). [sent-370, score-0.559]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sn', 0.338), ('tour', 0.275), ('trajectories', 0.267), ('appointments', 0.229), ('tours', 0.211), ('trajectory', 0.203), ('permutations', 0.194), ('rj', 0.192), ('salesman', 0.183), ('tsp', 0.167), ('temperature', 0.166), ('instances', 0.152), ('gibbs', 0.144), ('solutions', 0.138), ('ppr', 0.132), ('temperatures', 0.132), ('permutation', 0.128), ('average', 0.108), ('era', 0.105), ('week', 0.097), ('combinatorial', 0.092), ('normalizing', 0.09), ('action', 0.088), ('tn', 0.083), ('appointment', 0.079), ('eh', 0.079), ('argmin', 0.078), ('metropolis', 0.078), ('eb', 0.077), ('relaxed', 0.077), ('defined', 0.075), ('heuristic', 0.063), ('elastic', 0.063), ('equivalence', 0.063), ('underlying', 0.06), ('traveling', 0.058), ('overfitting', 0.058), ('group', 0.056), ('define', 0.056), ('xl', 0.055), ('holds', 0.054), ('averaging', 0.054), ('instance', 0.053), ('ti', 0.053), ('bijective', 0.053), ('bonn', 0.053), ('ike', 0.053), ('kirkpatrick', 0.053), ('mem', 0.053), ('rjl', 0.053), ('temperaturet', 0.053), ('length', 0.052), ('carlo', 0.048), ('monte', 0.048), ('db', 0.048), ('gn', 0.046), ('buhmann', 0.046), ('optimization', 0.044), ('euclidean', 0.042), ('costs', 0.042), ('symmetry', 0.042), ('symmetries', 0.042), ('noisy', 0.042), ('durbin', 0.039), ('reality', 0.039), ('suffices', 0.039), ('gt', 0.039), ('sampled', 0.038), ('xi', 0.038), ('completely', 0.038), ('finite', 0.038), ('complexity', 0.037), ('hypotheses', 0.037), ('sampler', 0.037), ('hypothesis', 0.036), ('relationship', 0.035), ('chain', 0.035), ('analogue', 0.035), ('structure', 0.035), ('el', 0.033), ('cardinality', 0.033), ('risk', 0.032), ('distances', 0.032), ('locations', 0.032), ('plotted', 0.032), ('multiplication', 0.032), ('newly', 0.031), ('solution', 0.03), ('concepts', 0.03), ('circle', 0.029), ('identifying', 0.029), ('empirical', 0.028), ('might', 0.028), ('comments', 0.028), ('id', 0.028), ('addition', 0.028), ('exp', 0.027), ('let', 0.027), ('prove', 0.027), ('passing', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000013 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

Author: Mikio L. Braun, Joachim M. Buhmann

Abstract: We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. This procedure requires identifying the exact relationship between permutations and tours. In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. 1

2 0.13316442 119 nips-2001-Means, Correlations and Bounds

Author: Martijn Leisink, Bert Kappen

Abstract: The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to non-trivial regions (i.e. a subset of [-1 , 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1

3 0.10735416 187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning

Author: Shie Mannor, Nahum Shimkin

Abstract: We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well. 1

4 0.087837674 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

Author: Shai Fine, Katya Scheinberg

Abstract: We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence. 1

5 0.087048031 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

Author: Manfred Opper, Robert Urbanczik

Abstract: Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1

6 0.07239531 31 nips-2001-Algorithmic Luckiness

7 0.068233207 183 nips-2001-The Infinite Hidden Markov Model

8 0.059547227 40 nips-2001-Batch Value Function Approximation via Support Vectors

9 0.058451153 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

10 0.058063362 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

11 0.058016967 99 nips-2001-Intransitive Likelihood-Ratio Classifiers

12 0.057176575 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

13 0.055227544 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning

14 0.052406058 171 nips-2001-Spectral Relaxation for K-means Clustering

15 0.051924914 180 nips-2001-The Concave-Convex Procedure (CCCP)

16 0.051724385 13 nips-2001-A Natural Policy Gradient

17 0.05086809 143 nips-2001-PAC Generalization Bounds for Co-training

18 0.049639083 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

19 0.048721232 59 nips-2001-Direct value-approximation for factored MDPs

20 0.048663579 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.177), (1, -0.017), (2, 0.075), (3, -0.032), (4, 0.038), (5, -0.061), (6, 0.03), (7, 0.024), (8, -0.034), (9, -0.014), (10, 0.004), (11, 0.07), (12, 0.016), (13, 0.01), (14, 0.055), (15, 0.124), (16, -0.0), (17, -0.02), (18, -0.009), (19, -0.075), (20, -0.159), (21, 0.039), (22, 0.024), (23, -0.029), (24, 0.003), (25, 0.058), (26, -0.088), (27, -0.063), (28, -0.039), (29, 0.04), (30, 0.031), (31, 0.008), (32, 0.028), (33, -0.031), (34, 0.052), (35, 0.0), (36, 0.042), (37, 0.031), (38, 0.208), (39, 0.115), (40, 0.1), (41, 0.032), (42, -0.075), (43, 0.047), (44, -0.01), (45, 0.019), (46, 0.057), (47, 0.087), (48, 0.317), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9529075 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

Author: Mikio L. Braun, Joachim M. Buhmann

Abstract: We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. This procedure requires identifying the exact relationship between permutations and tours. In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. 1

2 0.59961087 119 nips-2001-Means, Correlations and Bounds

Author: Martijn Leisink, Bert Kappen

Abstract: The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to non-trivial regions (i.e. a subset of [-1 , 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1

3 0.532157 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

Author: Shai Fine, Katya Scheinberg

Abstract: We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence. 1

4 0.45104346 31 nips-2001-Algorithmic Luckiness

Author: Ralf Herbrich, Robert C. Williamson

Abstract: In contrast to standard statistical learning theory which studies uniform bounds on the expected error we present a framework that exploits the specific learning algorithm used. Motivated by the luckiness framework [8] we are also able to exploit the serendipity of the training sample. The main difference to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classifiers which results in a bound that exploits both the margin and the distribution of the data in feature space. 1

5 0.40991628 30 nips-2001-Agglomerative Multivariate Information Bottleneck

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

6 0.39079645 1 nips-2001-(Not) Bounding the True Error

7 0.35677963 187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning

8 0.33613729 99 nips-2001-Intransitive Likelihood-Ratio Classifiers

9 0.33339304 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding

10 0.32438892 93 nips-2001-Incremental A*

11 0.32175672 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

12 0.31772465 149 nips-2001-Probabilistic Abstraction Hierarchies

13 0.31280062 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

14 0.30356619 21 nips-2001-A Variational Approach to Learning Curves

15 0.30350399 148 nips-2001-Predictive Representations of State

16 0.29813394 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique

17 0.29514211 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation

18 0.28937569 61 nips-2001-Distribution of Mutual Information

19 0.28323734 76 nips-2001-Fast Parameter Estimation Using Green's Functions

20 0.28165519 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.046), (17, 0.032), (19, 0.054), (27, 0.087), (30, 0.062), (38, 0.036), (59, 0.053), (72, 0.081), (78, 0.297), (79, 0.056), (83, 0.02), (91, 0.103)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80336094 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

Author: Mikio L. Braun, Joachim M. Buhmann

Abstract: We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. This procedure requires identifying the exact relationship between permutations and tours. In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. 1

2 0.57219529 182 nips-2001-The Fidelity of Local Ordinal Encoding

Author: Javid Sadr, Sayan Mukherjee, Keith Thoresz, Pawan Sinha

Abstract: A key question in neuroscience is how to encode sensory stimuli such as images and sounds. Motivated by studies of response properties of neurons in the early cortical areas, we propose an encoding scheme that dispenses with absolute measures of signal intensity or contrast and uses, instead, only local ordinal measures. In this scheme, the structure of a signal is represented by a set of equalities and inequalities across adjacent regions. In this paper, we focus on characterizing the fidelity of this representation strategy. We develop a regularization approach for image reconstruction from ordinal measures and thereby demonstrate that the ordinal representation scheme can faithfully encode signal structure. We also present a neurally plausible implementation of this computation that uses only local update rules. The results highlight the robustness and generalization ability of local ordinal encodings for the task of pattern classification. 1

3 0.5346697 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

Author: Mário Figueiredo

Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.

4 0.53270668 13 nips-2001-A Natural Policy Gradient

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

5 0.529459 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

Author: Carl E. Rasmussen, Zoubin Ghahramani

Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.

6 0.52639931 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

7 0.52373761 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

8 0.52195096 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

9 0.52146554 155 nips-2001-Quantizing Density Estimators

10 0.52054596 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

11 0.52011621 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

12 0.51936209 171 nips-2001-Spectral Relaxation for K-means Clustering

13 0.51899004 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

14 0.5187124 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

15 0.51778305 1 nips-2001-(Not) Bounding the True Error

16 0.51711011 121 nips-2001-Model-Free Least-Squares Policy Iteration

17 0.51710474 36 nips-2001-Approximate Dynamic Programming via Linear Programming

18 0.51693708 46 nips-2001-Categorization by Learning and Combining Object Parts

19 0.51659644 56 nips-2001-Convolution Kernels for Natural Language

20 0.51629221 185 nips-2001-The Method of Quantum Clustering