nips nips2005 nips2005-191 knowledge-graph by maker-knowledge-mining

191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

Source: pdf

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: The Perceptron algorithm, despite its simplicity, often performs well on online classiﬁcation tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difﬁculty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithm for kernel-based online learning on a ﬁxed memory budget. To our knowledge, this is the ﬁrst online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, a common difﬁculty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. [sent-6, score-0.536]

2 In this paper we present and analyze the Forgetron algorithm for kernel-based online learning on a ﬁxed memory budget. [sent-7, score-0.32]

3 To our knowledge, this is the ﬁrst online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. [sent-8, score-0.702]

4 This set of stored examples is the online equivalent of the support set of SVMs, however in contrast to the support, it continually changes as learning progresses. [sent-15, score-0.302]

5 In this paper, we call this set the active set, as it includes those examples that actively deﬁne the current classiﬁer. [sent-16, score-0.238]

6 Typically, an example is added to the active set every time the online algorithm makes a prediction mistake, or when its conﬁdence in a prediction is inadequately low. [sent-17, score-0.684]

7 Naturally, since computing devices have bounded memory resources, there is the danger that an online algorithm would require more memory than is physically available. [sent-19, score-0.397]

8 This problem becomes especially eminent in cases where the online algorithm is implemented as part of a specialized hardware system with a small memory, such as a mobile telephone or an au- tonomous robot. [sent-20, score-0.267]

9 Moreover, an excessively large active set can lead to unacceptably long running times, as the time-complexity of each online round scales linearly with the size of the active set. [sent-21, score-0.855]

10 Crammer, Kandola, and Singer [2] ﬁrst addressed this problem by describing an online kernel-based modiﬁcation of the Perceptron algorithm in which the active set does not exceed a predeﬁned budget. [sent-22, score-0.456]

11 Their algorithm removes redundant examples from the active set so as to make the best use of the limited memory resource. [sent-23, score-0.349]

12 Weston, Bordes and Bottou [9] followed with their own online kernel machine on a budget. [sent-24, score-0.268]

13 In this paper we present the Forgetron algorithm for online kernel-based classiﬁcation. [sent-26, score-0.267]

14 To the best of our knowledge, the Forgetron is the ﬁrst online algorithm with a ﬁxed memory budget which also entertains a formal worst-case mistake bound. [sent-27, score-0.994]

15 We name our algorithm the Forgetron since its update builds on that of the Perceptron and since it gradually forgets active examples as learning progresses. [sent-28, score-0.313]

16 2 we begin with a more formal presentation of our problem and discuss some difﬁculties in proving mistake bounds for kernel-methods on a budget. [sent-31, score-0.42]

17 3 we present an algorithmic framework for online prediction with a predeﬁned budget of active examples. [sent-33, score-0.744]

18 On round t the online algorithm observes an instance xt , which is drawn from some predeﬁned instance domain X . [sent-40, score-0.66]

19 The predictions of the online algorithm are determined by a hypothesis which is stored in its internal memory and is updated from round to round. [sent-44, score-0.693]

20 Our focus in this paper is on margin based hypotheses, namely, ft is a function from X to R where sign(ft (xt )) constitutes the actual binary prediction and |ft (xt )| is the conﬁdence in this prediction. [sent-46, score-0.466]

21 First, we can check whether the hypothesis makes a prediction mistake, namely determine whether y = sign(f (x)) or not. [sent-49, score-0.26]

22 Throughout this paper, we use M to denote the number of prediction mistakes made by an online algorithm on a sequence of examples (x1 , y1 ), . [sent-50, score-0.56]

23 That is, the hypothesis used on round t takes the form, ft (x) = i∈It αi K(xi , x), where It is a subset of {1, . [sent-72, score-0.652]

24 , (t-1)} and xi is the example observed by the algorithm on round i. [sent-75, score-0.285]

25 As stated above, It is called the active set, and we say that example xi is active on round t if i ∈ It . [sent-76, score-0.679]

26 Perhaps the most well known online algorithm for binary classiﬁcation is the Perceptron [6]. [sent-77, score-0.298]

27 Stated in the form of a kernel method, the hypotheses generated by the Perceptron take the form ft (x) = i∈It yi K(xi , x). [sent-78, score-0.396]

28 Namely, the weight assigned to each active example is either +1 or −1, depending on the label of that example. [sent-79, score-0.244]

29 It then updates its hypothesis only on rounds where a prediction mistake is made. [sent-81, score-0.615]

30 Concretely, on round t, if ft (xt ) = yt then the index t is inserted into the active set. [sent-82, score-0.832]

31 As a consequence, the size of the active set on round t equals the number of prediction mistakes made on previous rounds. [sent-83, score-0.669]

32 A relative mistake bound can be proven for the Perceptron algorithm. [sent-84, score-0.416]

33 The bound holds for any sequence of instance-label pairs, and compares the number of mistakes made by the Perceptron with the cumulative hinge-loss of any ﬁxed hypothesis g ∈ HK , even one deﬁned with prior knowledge of the sequence. [sent-85, score-0.378]

34 , (xT , yT ) be a sequence of examples such that K(xt , xt ) ≤ 1 for all t. [sent-90, score-0.239]

35 Then the number of prediction mistakes made by the Perceptron ˆ on this sequence is bounded by, M ≤ g 2 + 2 T ℓt . [sent-92, score-0.268]

36 t=1 Although the Perceptron is guaranteed to be competitive with any ﬁxed hypothesis g ∈ HK , the fact that its active set can grow without a bound poses a serious computational problem. [sent-93, score-0.416]

37 In fact, this problem is common to most kernel-based online methods that do not explicitly monitor the size of It . [sent-94, score-0.253]

38 As discussed above, our goal is to derive and analyze an online prediction algorithm which resolves these problems by enforcing a ﬁxed bound on the size of the active set. [sent-95, score-0.628]

39 Formally, let B be a positive integer, which we refer to as the budget parameter. [sent-96, score-0.262]

40 We would like to devise an algorithm which enforces the constraint |It | ≤ B on every round t. [sent-97, score-0.364]

41 Furthermore, we would like to prove a relative mistake bound for this algorithm, analogous to the bound stated in Thm. [sent-98, score-0.533]

42 We show this inherent limitation by presenting a simple counterexample which applies to any online algorithm which uses a prediction function of the form given in Eq. [sent-101, score-0.383]

43 In this example, we show a hypothesis g ∈ HK and an arbitrarily long sequence of examples such that the algorithm makes a prediction mistake on every single round whereas g suffers no loss at all. [sent-103, score-0.97]

44 Now for every t, ft is a linear combination of at most B vectors from X . [sent-106, score-0.337]

45 Since |X | = B + 1, there exists a vector xt ∈ X which is currently not in the active set. [sent-107, score-0.339]

46 Furthermore, xt is orthogonal to all of the active vectors and therefore ft (xt ) = 0. [sent-108, score-0.64]

47 Assume without loss of generality that the online algorithm we are using predicts yt to be −1 when ft (x) = 0. [sent-109, score-0.664]

48 If on every round we were to present the online algorithm with the example (xt , +1) then the online algorithm would make a B+1 prediction mistake on every round. [sent-110, score-1.266]

49 We have found a sequence of examples and a ﬁxed hypothesis (which is indeed deﬁned by more than B vectors from X ) that attains a cumulative loss of zero on this sequence, while the number of mistakes made by the online algorithm equals the number of rounds. [sent-112, score-0.659]

50 Formally, we wish to devise an online algorithm which is competitive with every hypothesis g ∈ HK for which g ≤ U , for some constant U . [sent-117, score-0.487]

51 Our counterexample indicates that we cannot prove a relative mistake bound with U set to at √ least B + 1, since that was the norm of g in our counterexample. [sent-118, score-0.482]

52 In this paper we come ¯ close to this upper bound by proving that our algorithms can compete with any hypothesis 1 with a norm bounded by 4 (B + 1)/ log(B + 1). [sent-119, score-0.317]

53 Recall that whenever the Perceptron makes a prediction mistake, it updates its hypothesis by adding the element t to It . [sent-121, score-0.259]

54 Thus, on any given round, the size of its active set equals the number of prediction mistakes it has made so far. [sent-122, score-0.445]

55 This implies that the Perceptron may violate the budget constraint |It | ≤ B. [sent-123, score-0.262]

56 We can solve this problem by removing an example from the active set whenever its size exceeds B. [sent-124, score-0.263]

57 One simple strategy is to remove the oldest example in the active set whenever |It | > B. [sent-125, score-0.298]

58 Let t be a round on which the Perceptron makes a prediction mistake. [sent-126, score-0.331]

59 Otherwise, we apply a removal step by ﬁnding the ′ ′ oldest example in the active set, rt = min It , and setting It+1 = It \ {rt }. [sent-131, score-0.546]

60 The resulting algorithm is a simple modiﬁcation of the kernel Perceptron, which conforms with a ﬁxed budget constraint. [sent-132, score-0.312]

61 While we are unable to prove a mistake bound for this algorithm, it is nonetheless an important milestone on the path to an algorithm with a ﬁxed budget and a formal mistake bound. [sent-133, score-1.091]

62 The removal of the oldest active example from It may signiﬁcantly change the hypothesis and effect its accuracy. [sent-134, score-0.519]

63 By controlling the weight of the oldest active example, we can guarantee that the removal step will not signiﬁcantly effect the accuracy of our predictions. [sent-136, score-0.456]

64 More formally, we redeﬁne our hypothesis to be, ft = i∈It σi,t yi K(xi , ·) , where each σi,t is a weight in (0, 1]. [sent-137, score-0.457]

65 On round t, if a prediction mistake occurs, a three step update is performed. [sent-141, score-0.734]

66 The ﬁrst step is the standard Perceptron update, namely, the index t is inserted into the active set and the ′ weight σt,t is set to be 1. [sent-142, score-0.275]

67 Let It denote the active set which results from this update, and let ft′ denote the resulting hypothesis, ft′ (x) = ft (x) + yt K(xt , x). [sent-143, score-0.609]

68 The second step of the update is a shrinking step in which we scale f ′ by a coefﬁcient φt ∈ (0, 1]. [sent-144, score-0.309]

69 If the shrinking coefﬁcients φt are sufﬁciently small, then the example weights σi,t decrease rapidly with t, and particularly the weight of the oldest active example can be made arbitrarily small. [sent-152, score-0.529]

70 Alas, aggressively shrinking the online hypothesis with every update might itself degrade the performance of the online hypothesis and therefore φt should not be set too small. [sent-154, score-0.991]

71 To formalize this tradeoff, we begin with the mistake bound in Thm. [sent-156, score-0.416]

72 1 and investigate how it is effected by the shrinking and removal steps. [sent-157, score-0.319]

73 Let J denote the set of rounds on which the Forgetron makes a prediction mistake and deﬁne the function, Ψ(σ , φ , µ) = (σ φ)2 + 2 σ φ(1 − φ µ) . [sent-159, score-0.51]

74 On this round, example rt is removed from the active set. [sent-161, score-0.338]

75 Let µt = yrt ft′ (xrt ) be the signed margin attained by ft′ on the active example being removed. [sent-162, score-0.238]

76 Lemma 1 below states that removing example rt from the active set on round t increases the mistake bound by Ψt . [sent-164, score-0.975]

77 In addition, it is clear from the deﬁnition of Ψt that µt also plays a key role in determining whether xrt can be safely removed from the active set. [sent-166, score-0.254]

78 We note in passing that [2] used a heuristic criterion similar to µt to dynamically choose which active example to remove on each online round. [sent-167, score-0.42]

79 Turning to the shrinking step, for every t ∈ J we deﬁne,  if ft+1 ≥ U  1 φt if ft′ ≤ U ∧ ft+1 < U Φt =  φt ft′ if ft′ > U ∧ ft+1 < U U . [sent-168, score-0.236]

80 Lemma 1 below also states that applying the shrinking step on round t increases the mistake bound by U 2 log(1/Φt ). [sent-169, score-0.875]

81 Note that if ft+1 ≥ U then Φt = 1 and the shrinking step on round t has no effect on our mistake bound. [sent-170, score-0.81]

82 Intuitively, this is due to the fact that, in this case, the shrinking step does not make the norm of ft+1 smaller than the norm of our competitor, g. [sent-171, score-0.305]

83 , (xT , yT ) be a sequence of examples such that K(xt , xt ) ≤ 1 for all t and assume that this sequence is presented to the Forgetron with a budget constraint ˆ B. [sent-176, score-0.541]

84 The ﬁrst term in the bound of Lemma 1 is identical to the mistake bound of the standard Perceptron, given in Thm. [sent-179, score-0.481]

85 The second term is the consequence of the removal and shrinking steps. [sent-181, score-0.319]

86 If we set the shrinking coefﬁcients in such a way that the second term is at ˆ most M , then the bound in Lemma 1 reduces to M ≤ g 2 + 2 t ℓt + M . [sent-182, score-0.265]

87 In the next section, we deﬁne the speciﬁc mechanism used by the Forgetron algorithm to choose the shrinking coefﬁcients φt . [sent-188, score-0.236]

88 Then, we conclude our analysis by arguing that this choice satisﬁes the sufﬁcient conditions stated in Lemma 2, and obtain a mistake bound as described above. [sent-189, score-0.468]

89 In words, Jt is the set of rounds on which the algorithm made a mistake up until round t, and Mt is the size of this set. [sent-193, score-0.712]

90 Let i denote a round on which the algorithm makes a prediction mistake and on which an example must be removed from 15 the active set. [sent-201, score-0.937]

91 To prove a mistake bound it sufﬁces to show that the two conditions stated in Lemma 2 hold. [sent-210, score-0.468]

92 I NPUT: Mercer kernel K(·, ·) ; budget parameter B > 0 I NITIALIZE : I1 = ∅ ; f1 ≡ 0 ; Q1 = 0 ; M0 = 0 For t = 1, 2, . [sent-214, score-0.276]

93 , (xT , yT ) be a sequence of examples such that K(xt , xt ) ≤ 1 for all t. [sent-222, score-0.239]

94 Then, the number of prediction 4 mistakes made by the Forgetron on this sequence is at most, T M ≤ 2 g 2 ˆ ℓt + 4 t=1 5 Experiments and Discussion In this section we present preliminary experimental results which demonstrate the merits of the Forgetron algorithm. [sent-226, score-0.27]

95 When the CKS algorithm exceeds its budget, it removes the active example whose margin would be the largest after the removal. [sent-228, score-0.296]

96 For each budget value, we ran the two algorithms on all 126 binary problems and averaged the results. [sent-232, score-0.291]

97 05 1000 2000 3000 4000 budget size − B 5000 6000 200 400 600 800 1000 1200 1400 1600 1800 budget size − B Figure 2: The error of different budget algorithms as a function of the budget size B on the censusincome (adult) dataset (left) and on the MNIST dataset (right). [sent-249, score-1.105]

98 In this paper we described the Forgetron algorithm which is a kernel-based online learning algorithm with a ﬁxed memory budget. [sent-259, score-0.356]

99 We 4 further argued that no algorithm with a√ budget of B active examples can be competitive with every hypothesis whose norm is B + 1, on every input sequence. [sent-261, score-0.782]

100 The analysis presented in this paper can be used to derive a family of online algorithms of which the Forgetron is only one special case. [sent-263, score-0.252]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('forgetron', 0.462), ('mistake', 0.351), ('ft', 0.301), ('perceptron', 0.266), ('budget', 0.239), ('online', 0.231), ('round', 0.224), ('shrinking', 0.2), ('active', 0.189), ('xt', 0.15), ('hypothesis', 0.127), ('rt', 0.119), ('removal', 0.119), ('cks', 0.106), ('yt', 0.096), ('lemma', 0.093), ('hk', 0.092), ('mistakes', 0.092), ('mt', 0.086), ('prediction', 0.085), ('oldest', 0.084), ('qt', 0.07), ('bound', 0.065), ('jt', 0.062), ('hypotheses', 0.058), ('yf', 0.053), ('memory', 0.053), ('rounds', 0.052), ('stated', 0.052), ('margin', 0.049), ('examples', 0.049), ('formal', 0.049), ('sequence', 0.04), ('qi', 0.04), ('mercer', 0.039), ('update', 0.039), ('classi', 0.038), ('kernel', 0.037), ('every', 0.036), ('algorithm', 0.036), ('sign', 0.036), ('entertains', 0.035), ('initializes', 0.035), ('xrt', 0.035), ('mnist', 0.035), ('step', 0.035), ('norm', 0.035), ('competitive', 0.035), ('crammer', 0.032), ('ei', 0.031), ('binary', 0.031), ('innerproduct', 0.031), ('fth', 0.031), ('bordes', 0.031), ('dekel', 0.031), ('rb', 0.031), ('counterexample', 0.031), ('dataset', 0.031), ('equals', 0.03), ('prede', 0.03), ('removed', 0.03), ('weight', 0.029), ('cation', 0.029), ('mi', 0.027), ('removing', 0.027), ('made', 0.027), ('cumulative', 0.027), ('label', 0.026), ('claims', 0.026), ('kandola', 0.026), ('merits', 0.026), ('abbreviate', 0.026), ('namely', 0.026), ('xi', 0.025), ('whenever', 0.025), ('compete', 0.025), ('bounded', 0.024), ('enforces', 0.023), ('constraint', 0.023), ('datasets', 0.023), ('coef', 0.023), ('let', 0.023), ('ready', 0.022), ('devise', 0.022), ('adult', 0.022), ('removes', 0.022), ('labels', 0.022), ('stored', 0.022), ('formally', 0.022), ('makes', 0.022), ('size', 0.022), ('inserted', 0.022), ('cients', 0.021), ('algorithms', 0.021), ('zj', 0.021), ('weston', 0.02), ('proving', 0.02), ('ji', 0.02), ('observes', 0.019), ('hebrew', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

2 0.16877653 76 nips-2005-From Batch to Transductive Online Learning

Author: Sham Kakade, Adam Tauman Kalai

Abstract: It is well-known that everything that is learnable in the difﬁcult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. We give an efﬁcient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efﬁciently learnable in a batch model and a transductive online model.

3 0.16353661 54 nips-2005-Data-Driven Online to Batch Conversions

Author: Ofer Dekel, Yoram Singer

Abstract: Online learning algorithms are typically fast, memory efﬁcient, and simple to implement. However, many common learning problems ﬁt more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing online algorithm. We ﬁrst give a uniﬁed overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions. Our conversions ﬁnd hypotheses with a small risk by explicitly minimizing datadependent generalization bounds. We experimentally demonstrate the usefulness of our approach and in particular show that the data-driven conversions consistently outperform the data-independent conversions.

4 0.13363706 41 nips-2005-Coarse sample complexity bounds for active learning

Author: Sanjoy Dasgupta

Abstract: We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the speciﬁc target hypothesis, and the desired accuracy.

5 0.1140236 50 nips-2005-Convex Neural Networks

Author: Yoshua Bengio, Nicolas L. Roux, Pascal Vincent, Olivier Delalleau, Patrice Marcotte

Abstract: Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artiﬁcial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an inﬁnite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time ﬁnding a linear classiﬁer that minimizes a weighted sum of errors. 1

6 0.097612403 95 nips-2005-Improved risk tail bounds for on-line algorithms

7 0.096195348 74 nips-2005-Faster Rates in Regression via Active Learning

8 0.084962428 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models

9 0.077364117 45 nips-2005-Conditional Visual Tracking in Kernel Space

10 0.067450434 148 nips-2005-Online Discovery and Learning of Predictive State Representations

11 0.066877626 19 nips-2005-Active Learning for Misspecified Models

12 0.064535365 144 nips-2005-Off-policy Learning with Options and Recognizers

13 0.064490855 80 nips-2005-Gaussian Process Dynamical Models

14 0.063996404 117 nips-2005-Learning from Data of Variable Quality

15 0.061581515 160 nips-2005-Query by Committee Made Real

16 0.0596736 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

17 0.057250481 85 nips-2005-Generalization to Unseen Cases

18 0.056946043 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

19 0.055682819 104 nips-2005-Laplacian Score for Feature Selection

20 0.054538801 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.172), (1, 0.054), (2, 0.014), (3, -0.082), (4, 0.127), (5, 0.106), (6, 0.068), (7, 0.086), (8, -0.065), (9, 0.13), (10, -0.146), (11, 0.278), (12, -0.011), (13, -0.127), (14, 0.021), (15, -0.158), (16, -0.093), (17, 0.001), (18, -0.049), (19, 0.005), (20, -0.045), (21, 0.046), (22, 0.058), (23, 0.005), (24, 0.124), (25, -0.022), (26, 0.002), (27, 0.003), (28, 0.03), (29, -0.057), (30, -0.037), (31, 0.01), (32, -0.0), (33, -0.059), (34, 0.018), (35, -0.05), (36, 0.093), (37, 0.023), (38, -0.05), (39, -0.044), (40, -0.016), (41, 0.001), (42, -0.045), (43, -0.029), (44, 0.102), (45, 0.069), (46, 0.058), (47, 0.007), (48, -0.015), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96121657 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

2 0.82807404 54 nips-2005-Data-Driven Online to Batch Conversions

Author: Ofer Dekel, Yoram Singer

3 0.76878083 76 nips-2005-From Batch to Transductive Online Learning

Author: Sham Kakade, Adam Tauman Kalai

4 0.64908302 41 nips-2005-Coarse sample complexity bounds for active learning

Author: Sanjoy Dasgupta

5 0.56003928 95 nips-2005-Improved risk tail bounds for on-line algorithms

Author: NicolĂ˛ Cesa-bianchi, Claudio Gentile

Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.

6 0.4257606 160 nips-2005-Query by Committee Made Real

7 0.39478493 74 nips-2005-Faster Rates in Regression via Active Learning

8 0.37909776 117 nips-2005-Learning from Data of Variable Quality

9 0.37090024 45 nips-2005-Conditional Visual Tracking in Kernel Space

10 0.36307901 148 nips-2005-Online Discovery and Learning of Predictive State Representations

11 0.36180335 50 nips-2005-Convex Neural Networks

12 0.34840599 19 nips-2005-Active Learning for Misspecified Models

13 0.34624273 85 nips-2005-Generalization to Unseen Cases

14 0.34532076 68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care

15 0.33479187 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

16 0.31902632 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models

17 0.31541139 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

18 0.31367835 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers

19 0.29869324 144 nips-2005-Off-policy Learning with Options and Recognizers

20 0.28810847 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.094), (10, 0.048), (13, 0.277), (27, 0.03), (31, 0.026), (34, 0.104), (41, 0.01), (55, 0.038), (69, 0.047), (73, 0.037), (88, 0.097), (91, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80553997 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

2 0.58137351 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1

3 0.57422048 74 nips-2005-Faster Rates in Regression via Active Learning

Author: Rebecca Willett, Robert Nowak, Rui M. Castro

Abstract: This paper presents a rigorous statistical analysis characterizing regimes in which active learning signiﬁcantly outperforms classical passive learning. Active learning algorithms are able to make queries or select sample locations in an online fashion, depending on the results of the previous queries. In some regimes, this extra ﬂexibility leads to signiﬁcantly faster rates of error decay than those possible in classical passive learning settings. The nature of these regimes is explored by studying fundamental performance limits of active and passive learning in two illustrative nonparametric function classes. In addition to examining the theoretical potential of active learning, this paper describes a practical algorithm capable of exploiting the extra ﬂexibility of the active setting and provably improving upon the classical passive techniques. Our active learning theory and methods show promise in a number of applications, including ﬁeld estimation using wireless sensor networks and fault line detection. 1

4 0.57348007 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

Author: Aurelie C. Lozano, Sanjeev R. Kulkarni, Robert E. Schapire

Abstract: We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classiﬁers resulting from a regularization achieved by restricting the 1-norm of the base classiﬁers’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter.

5 0.57278442 177 nips-2005-Size Regularized Cut for Data Clustering

Author: Yixin Chen, Ya Zhang, Xiang Ji

Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is deﬁned as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1

6 0.57218236 50 nips-2005-Convex Neural Networks

7 0.56966704 41 nips-2005-Coarse sample complexity bounds for active learning

8 0.56843513 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

9 0.56769174 160 nips-2005-Query by Committee Made Real

10 0.56766355 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

11 0.56581533 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

12 0.56467783 47 nips-2005-Consistency of one-class SVM and related algorithms

13 0.56450796 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

14 0.56361109 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

15 0.56283087 58 nips-2005-Divergences, surrogate loss functions and experimental design

16 0.56270909 54 nips-2005-Data-Driven Online to Batch Conversions

17 0.5625639 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

18 0.56051183 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

19 0.56013018 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

20 0.5591405 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization