nips nips2009 nips2009-63 knowledge-graph by maker-knowledge-mining

63 nips-2009-DUOL: A Double Updating Approach for Online Learning


Source: pdf

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. [sent-19, score-0.384]

2 This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. [sent-20, score-0.261]

3 In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. [sent-21, score-0.239]

4 Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. [sent-22, score-0.483]

5 We show that the mistake bound can be significantly improved by the proposed online learning method. [sent-23, score-0.378]

6 Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. [sent-24, score-0.27]

7 Most online learning algorithms work by assigning a fixed weight to a new example when it is misclassified. [sent-28, score-0.303]

8 This is clearly insufficient because when a new example is added to the pool of support vectors, we expect it to affect the weights assigned to the existing support vectors received in previous trials. [sent-30, score-0.364]

9 Although several online algorithms are capable of updating the example weights as the learning process goes, most of them are designed for the purposes other than improving the classification accuracy and reducing the mistake bound. [sent-31, score-0.507]

10 , 2005), online learning algorithms are proposed to adjust the example weights in order to fit in the constraint of fixed number of support vectors; in (Cesa-Bianchi & Gentile, 2006), example weights are adjusted to track the drifting concepts. [sent-35, score-0.469]

11 In this paper, we propose a new formulation for online learning that aims to dynamically update the example weights in order to improve the classification accuracy as well as the mistake bound. [sent-36, score-0.428]

12 Instead of only assigning a weight to the misclassified example that is received in current trial, the proposed online learning algorithm also updates the weight for one of the existing support vectors. [sent-37, score-0.481]

13 The key question in the proposed online learning approach is which one of the existing support vectors should be selected for weight updating. [sent-40, score-0.465]

14 To this end, we employ an analysis for double updating online learning that is based on the recent work of online convex programming by incremental dual ascent (Shalev-Shwartz & Singer, 2006). [sent-41, score-0.685]

15 Our analysis shows that under certain conditions, the proposed online learning algorithm can significantly reduce the mistake bound of the existing online algorithms. [sent-42, score-0.635]

16 This result is further verified empirically by extensive experiments and comparison to the state-of-the-art algorithms for online learning. [sent-43, score-0.267]

17 Section 2 reviews the related work for online learning. [sent-45, score-0.239]

18 Section 3 presents the proposed “double updating” approach to online learning. [sent-46, score-0.252]

19 One of the most well-known online approaches is the Perceptron algorithm (Rosenblatt, 1958; Freund & Schapire, 1999), which updates the learning function by adding a new example with a constant weight into the current set of support vectors when it is misclassified. [sent-54, score-0.452]

20 Recently a number of online learning algorithms have been developed based on the criterion of maximum margin (Crammer & Singer, 2003; Gentile, 2001; Kivinen et al. [sent-55, score-0.314]

21 Empirical studies showed that the maximum margin based online learning algorithms are generally more effective than the Perceptron algorithm. [sent-62, score-0.287]

22 However, despite the difference, most online learning algorithms only update the weight of the newly added support vector, and keep the weights of the existing support vectors unchanged. [sent-63, score-0.663]

23 This constraint could significantly limit the effect of online learning. [sent-64, score-0.239]

24 Besides the studies for regular online learning, several algorithms are proposed for online learning with fixed budget. [sent-65, score-0.519]

25 In these studies, the total number of support vectors is required to be bounded either by a theoretical bound or by a manually fixed budget. [sent-66, score-0.187]

26 Example algorithms for fixed budget online learning include (Weston & Bordes, 2005; Crammer et al. [sent-67, score-0.323]

27 The key idea of these algorithms is to dynamically update the weights of the existing support vectors as a new support vector is added, and the support vector with the least weight will be discarded when the number of support vectors exceeds the budget. [sent-71, score-0.669]

28 The idea of discarding support vectors is also used in studies (Kivinen et al. [sent-72, score-0.186]

29 , 2008), a new “projection” approach is proposed for online learning that ensures the number of support vectors is bounded. [sent-76, score-0.411]

30 Besides, in (Cesa-Bianchi & Gentile, 2006), an online learning algorithm is proposed to handle the drifting concept, in which the weights of the existing support vectors are reduced whenever a new support vector is added. [sent-77, score-0.584]

31 Although these online learning algorithms are capable of dynamically adjusting the weights of support vectors, they are designed to either fit in the budget of the number of support vectors or to handle drifting concepts, not to improve the classification accuracy and the mistake bound. [sent-78, score-0.762]

32 The proposed online learning algorithm is closely related to the recent work of online convex programming by incremental dual ascent (Shalev-Shwartz & Singer, 2006). [sent-79, score-0.523]

33 Although the idea of simultaneously updating the weights of multiple support vectors was mentioned in (Shalev-Shwartz & Singer, 2006), no efficient updating algorithm was explicitly proposed. [sent-80, score-0.381]

34 As will be shown later, the online algorithm proposed in this work shares the same computational cost as that of conventional online learning algorithms, despite the need of updating weights of two support vectors. [sent-81, score-0.753]

35 1 Motivation We consider an online learning trial t with an incoming example that is misclassified. [sent-83, score-0.262]

36 , αn ) ∈ [0, C]n the weights assigned to the support vectors in D, where C is a predefined constant. [sent-93, score-0.208]

37 The resulting classifier, denoted by f (x), is expressed as n f (x) = αi yi κ(x, xi ) (1) i=1 Let (xa , ya ) be the misclassified example received in the trial t, i. [sent-94, score-0.405]

38 Let (xa , ya ) be an example misclassified by the current classifier f (x) = n i=1 αi yi κ(x, xi ), i. [sent-99, score-0.357]

39 Let f (x) = βya κ(x, xa ) + f (x) be the updated classifier with β > 0. [sent-102, score-0.213]

40 There exists at least one support vector xi ∈ D such that yi f (xi ) > yi f (xi ). [sent-103, score-0.268]

41 It follows from the fact that: ∃xi ∈ D, yi ya κ(xi , xa ) < 0 when ya f (xa ) < 0. [sent-105, score-0.764]

42 In the case when ya f (xa ) ≤ −γ, it is easy to verify that there exists some support vector (xb , yb ) who satisfies βya yb k(xa , xb ) ≤ −γ/n; at the meantime, it can be shown that when the classification confidence of (xb , yb ) is less than γ/n, i. [sent-107, score-1.565]

43 , yb f (xb ) ≤ γ/n, such support vector will be misclassified after the classifier is updated with the example (xa , ya ). [sent-109, score-0.676]

44 In order to alleviate this problem, we propose to update the weight for the existing support vector whose classification confidence is significantly affected by the new misclassified example. [sent-110, score-0.17]

45 In particular, we consider a support vector (xb , yb ) ∈ D for weight updating if it satisfies the following two conditions • yb f (xb ) ≤ 0, i. [sent-111, score-0.858]

46 , support vector (xb , yb ) is misclassified by the current classifier f (x) • k(xb , xa )ya yb ≤ −ρ where ρ ≥ 0 is a predefined threshold, i. [sent-113, score-0.927]

47 , support vector (xb , yb ) “conflicts” with the new misclassified example (xa , ya ). [sent-115, score-0.662]

48 We refer to the support vector satisfying the above conditions as auxiliary example. [sent-116, score-0.158]

49 It is clear that by adding the misclassified example (xa , ya ) to classifier f (x) with weight β, the classification score of (xb , yb ) will be reduced by at least βρ, which could lead to the misclassification of the auxiliary example (xb , yb ). [sent-117, score-0.995]

50 To avoid such a mistake, we propose to update the weights for both (xa , ya ) and (xb , yb ) simultaneously. [sent-118, score-0.62]

51 In the next section, we show the details of the double updating algorithm for online learning, and the analysis for mistake bound. [sent-119, score-0.526]

52 Our analysis follows closely the previous work on the relationship between online learning and the dual formulation of SVM (Shalev-Shwartz & Singer, 2006), in which the online learning is interpreted as an efficient updating rule for maximizing the objective function in the dual form of SVM. [sent-120, score-0.636]

53 If an online learning algorithm A is designed to ensure that all ∆t is bounded from the below by a positive constant ∆, then the number of mistakes made by A when trained over a sequence of trials (x1 , y1 ), . [sent-122, score-0.35]

54 In our analysis, we will show that ∆, which is referred to as the bounding constant for the improvement in the objective function, could be significantly improved when updating the weight for both the newly misclassified example and the auxiliary example. [sent-126, score-0.208]

55 For the remaining part of this paper, we denote by (xb , yb ) an auxiliary example that satisfies the two conditions specified before. [sent-127, score-0.378]

56 , αn−1 )) ∈ Rn−1 to denote the weights assigned to all the support vectors in D except (xb , yb ). [sent-131, score-0.524]

57 , yn−1 ) ∈ [−1, 1]n−1 the class labels assigned to all the examples in D except for (xb , yb ). [sent-135, score-0.331]

58 We define sa = κ(xa , xa ), sb = κ(xb , xb ), sab = κ(xa , xb ), wab = ya yb sab . [sent-136, score-1.503]

59 (4) According to the assumption of auxiliary example, we have wab = sab ya yb ≤ −ρ. [sent-137, score-0.725]

60 Finally, we denote by γb the weight for the auxiliary example (xb , yb ) that is used in the current classifier f (x), and by γa and γb the updated weights for (xa , ya ) and (xb , yb ), respectively. [sent-138, score-1.028]

61 2 Double Updating Online Learning Recall an auxiliary example (xb , yb ) should satisfy two conditions (I) yb f (xb ) ≤ 0, and (II) wab ≤ −ρ. [sent-141, score-0.759]

62 In addition, the new example (xa , ya ) received in the current iteration t is misclassified, i. [sent-142, score-0.275]

63 Following the framework of dual formulation for online learning, the following lemma shows how to compute ∆t , i. [sent-145, score-0.271]

64 , the improvement in the objective function of dual SVM by adjusting weights for (xa , ya ) and (xb , yb ). [sent-147, score-0.649]

65 It is straightforward to verify that the dual function of min ft 2 Hκ +C t i=1 (6) (yi ft (xi )), denoted by Dt (γ1 , . [sent-150, score-0.499]

66 , γt ) = t γi − i=1 γi yi ft (xi ) + i=1 t i=1 1 ft 2 2 Hκ (7) where 0 ≤ γi ≤ C, i = 1, . [sent-156, score-0.519]

67 , t and ft (·) = γi yi κ(·, xi ) is the current classifier. [sent-159, score-0.334]

68 Note that this constraint does not come directly from the box constraint that the weight for example (xb , yb ) is in the range [0, C], i. [sent-173, score-0.352]

69 , sb 2 g(∆γb ) = ∆γb (1 − yb ft−1 (xb ) − wab γa ) − ∆γb 2 Since wab ≤ −ρ and yb ft−1 (xb ) ≤ 0, it is clear that ∆γb ≥ 0 when maximizing g(∆γb ), which results in the constraint ∆γb ≥ 0. [sent-178, score-0.802]

70 Assume C > γb + 1/(1 − ρ) for the selected auxiliary example (xb , yb ). [sent-181, score-0.378]

71 We have the following bound for ∆, when updating the weights for the new example (xa , ya ) and the auxiliary example (xb , yb ) ∆≥ 1 1 + min (1 + ρ)2 , (C − γ)2 2 2 Proof. [sent-188, score-0.783]

72 The final remaining question is how to identify the auxiliary example (xb , yb ) efficiently, which requires efficiently updating the classification score yi f (xi ) for all the support vectors. [sent-191, score-0.648]

73 This updating procedure ensures that the computational cost of double updating online learning is O(n), where n is the number of support vectors, similar to that of the kernel online learning algorithm. [sent-194, score-0.867]

74 We denote by Ms the number of mistakes when we made a single update without finding appropriate auxiliary example. [sent-205, score-0.179]

75 1 Experimental Testbed and Setup We now evaluate the empirical performance of the proposed double updating online learning (DUOL) algorithm. [sent-214, score-0.427]

76 Finally, as an ideal yardstick, we also implement a full online SVM algorithm (“Online-SVM”) (Shalev-Shwartz & Singer, 2006), which updates all the support vectors in each trial, and is thus computationally extremely intensive as will be revealed in our study. [sent-222, score-0.416]

77 We evaluate the online learning performance by measuring mistake rate, i. [sent-234, score-0.351]

78 , the ratio of the number of mistakes made by the online learning algorithm over the total number of examples received for predictions. [sent-236, score-0.361]

79 In addition, to examine the sparsity of the resulting classifiers, we also evaluate the number of support vectors produced by each online learning algorithm. [sent-237, score-0.398]

80 Figure 2 to 6 show the mistake rates of all online learning algorithms in comparison over trials. [sent-243, score-0.379]

81 We observe that Online-SVM yields considerably better performance than the other online learning algorithms for dataset “german”, “splice”, “spambase”, and “MITFace”, however, at the price of extremely high computational cost. [sent-244, score-0.285]

82 For most cases, the running time of Online-SVM is two order, sometimes three order, higher than the other online learning algorithms, making it 1 http://www. [sent-245, score-0.239]

83 For the remaining part of this section, we restrict our discussion to the other six baseline online learning algorithms. [sent-257, score-0.258]

84 We also notice that the agg-ROMMA and the two PA algorithms consume considerably larger numbers of support vectors than the other three algorithms. [sent-261, score-0.205]

85 Second, comparing with all six competing algorithms, we observe that DUOL achieves significantly smaller mistake rates than the other single-updating algorithms in all cases. [sent-264, score-0.159]

86 This shows that the proposed double updating approach is effective in improving the online prediction performance. [sent-265, score-0.427]

87 By examining the sparsity of resulting classifiers, we observed that DUOL results in sparser classifiers than the three aggressive online learning algorithms, and denser classifiers than the three non-aggressive algorithms. [sent-266, score-0.266]

88 Third, according to the results of running time, we observe that DUOL is overall efficient compared to the state-of-the-art online learning algorithms. [sent-267, score-0.239]

89 677 Conclusions This paper presented a novel “double updating” approach to online learning named as “DUOL”, which not only updates the weight of the newly added support vector, but also adjusts the weight of one existing support vector that seriously conflicts with the new support vector. [sent-512, score-0.668]

90 We show that the mistake bound for an online classification task can be significantly reduced by the proposed DUOL algorithms. [sent-513, score-0.378]

91 Future work will address issues of multi-class double updating online learning. [sent-516, score-0.414]

92 4 PA−I PA−II Online−SVM DUOL Online average number of support vectors Online average rate of mistakes 0. [sent-519, score-0.321]

93 25 0 200 400 600 Number of samples 800 0 1000 (a) average rate of mistakes 0 200 400 600 Number of samples 800 1000 0 (b) average number of support vectors 200 400 600 Number of samples 800 1000 (c) average time cost (log10 t) Figure 2: Evaluation on the german dataset. [sent-527, score-0.455]

94 9) 3 2 average time cost (log10 t) Online average rate of mistakes Online average number of support vectors Perceptron ROMMA agg−ROMMA ALMA (0. [sent-537, score-0.37]

95 9) 7000 average time cost (log10 t) Online average rate of mistakes Online average number of support vectors Perceptron ROMMA agg−ROMMA ALMA (0. [sent-568, score-0.37]

96 27 0 2000 4000 6000 8000 10000 Number of samples 12000 14000 −2 16000 (b) average number of support vectors 0 2000 4000 6000 8000 10000 Number of samples 12000 14000 16000 (c) average time cost (log10 t) Figure 5: Evaluation on the a7a dataset. [sent-570, score-0.273]

97 3500 Online average rate of mistakes Online average number of support vectors Perceptron ROMMA agg−ROMMA ALMA (0. [sent-572, score-0.321]

98 5 Number of samples 2 (a) average rate of mistakes 2. [sent-591, score-0.157]

99 5 4 x 10 (b) average number of support vectors −2 0 0. [sent-595, score-0.184]

100 The forgetron: A kernel-based perceptron on a fixed budget. [sent-655, score-0.198]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('romma', 0.421), ('duol', 0.4), ('yb', 0.316), ('xb', 0.271), ('ya', 0.25), ('online', 0.239), ('ft', 0.227), ('xa', 0.199), ('perceptron', 0.198), ('pa', 0.161), ('misclassi', 0.158), ('alma', 0.158), ('agg', 0.138), ('mistake', 0.112), ('mistakes', 0.097), ('support', 0.096), ('updating', 0.094), ('crammer', 0.086), ('double', 0.081), ('yi', 0.065), ('wab', 0.065), ('gentile', 0.063), ('vectors', 0.063), ('auxiliary', 0.062), ('yt', 0.057), ('rosenblatt', 0.053), ('singer', 0.053), ('kivinen', 0.051), ('xt', 0.047), ('md', 0.045), ('xi', 0.042), ('wmin', 0.042), ('dekel', 0.04), ('sb', 0.04), ('svm', 0.039), ('classi', 0.039), ('spambase', 0.037), ('weight', 0.036), ('ii', 0.035), ('weights', 0.034), ('splice', 0.034), ('st', 0.032), ('mitface', 0.032), ('orabona', 0.032), ('sab', 0.032), ('dual', 0.032), ('budget', 0.029), ('algorithms', 0.028), ('et', 0.027), ('aggressive', 0.027), ('sa', 0.027), ('drifting', 0.025), ('average', 0.025), ('german', 0.025), ('received', 0.025), ('cost', 0.024), ('trial', 0.023), ('dynamically', 0.023), ('er', 0.022), ('freund', 0.022), ('almap', 0.021), ('cavallanti', 0.021), ('forgetron', 0.021), ('dt', 0.021), ('schapire', 0.021), ('margin', 0.02), ('end', 0.02), ('evaluation', 0.02), ('samples', 0.02), ('update', 0.02), ('six', 0.019), ('bordes', 0.018), ('williamson', 0.018), ('fink', 0.018), ('existing', 0.018), ('considerably', 0.018), ('updates', 0.018), ('adjusting', 0.017), ('cheng', 0.017), ('fti', 0.017), ('nanyang', 0.017), ('added', 0.017), ('newly', 0.016), ('keshet', 0.016), ('icts', 0.016), ('ms', 0.015), ('rate', 0.015), ('singapore', 0.015), ('assigned', 0.015), ('score', 0.015), ('prede', 0.015), ('cation', 0.014), ('bound', 0.014), ('updated', 0.014), ('li', 0.014), ('bounded', 0.014), ('jmlr', 0.014), ('conventional', 0.014), ('min', 0.013), ('proposed', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

2 0.24156162 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

3 0.14159453 45 nips-2009-Beyond Convexity: Online Submodular Minimization

Author: Elad Hazan, Satyen Kale

Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. 1

4 0.12931842 220 nips-2009-Slow Learners are Fast

Author: Martin Zinkevich, John Langford, Alex J. Smola

Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1

5 0.098143362 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

6 0.097651124 178 nips-2009-On Stochastic and Worst-case Models for Investing

7 0.089099653 181 nips-2009-Online Learning of Assignments

8 0.079750635 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

9 0.068175226 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

10 0.060300998 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

11 0.05524103 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

12 0.050531089 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

13 0.049934901 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

14 0.047088973 122 nips-2009-Label Selection on Graphs

15 0.045672636 177 nips-2009-On Learning Rotations

16 0.045531813 76 nips-2009-Efficient Learning using Forward-Backward Splitting

17 0.043038793 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

18 0.041799951 129 nips-2009-Learning a Small Mixture of Trees

19 0.041443236 72 nips-2009-Distribution Matching for Transduction

20 0.034990471 47 nips-2009-Boosting with Spatial Regularization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.113), (1, 0.117), (2, 0.057), (3, -0.045), (4, 0.139), (5, 0.116), (6, -0.078), (7, 0.101), (8, -0.054), (9, 0.073), (10, -0.012), (11, -0.01), (12, 0.075), (13, 0.042), (14, -0.024), (15, -0.02), (16, 0.071), (17, -0.037), (18, 0.025), (19, -0.032), (20, -0.196), (21, -0.134), (22, -0.038), (23, -0.088), (24, 0.022), (25, 0.103), (26, -0.081), (27, 0.121), (28, -0.004), (29, -0.135), (30, -0.026), (31, -0.071), (32, -0.105), (33, 0.017), (34, 0.171), (35, 0.019), (36, 0.112), (37, -0.024), (38, 0.045), (39, -0.027), (40, 0.13), (41, 0.036), (42, 0.108), (43, -0.029), (44, 0.043), (45, 0.071), (46, 0.069), (47, 0.013), (48, -0.108), (49, 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94751871 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

2 0.69256943 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

3 0.46368381 220 nips-2009-Slow Learners are Fast

Author: Martin Zinkevich, John Langford, Alex J. Smola

Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1

4 0.44424552 45 nips-2009-Beyond Convexity: Online Submodular Minimization

Author: Elad Hazan, Satyen Kale

Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. 1

5 0.43859333 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

6 0.4325861 74 nips-2009-Efficient Bregman Range Search

7 0.40304199 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

8 0.37625554 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

9 0.36899799 234 nips-2009-Streaming k-means approximation

10 0.34640813 178 nips-2009-On Stochastic and Worst-case Models for Investing

11 0.33616179 181 nips-2009-Online Learning of Assignments

12 0.3350656 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

13 0.32815263 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

14 0.32238427 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

15 0.30405182 177 nips-2009-On Learning Rotations

16 0.29785058 76 nips-2009-Efficient Learning using Forward-Backward Splitting

17 0.28457937 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control

18 0.27867651 24 nips-2009-Adapting to the Shifting Intent of Search Queries

19 0.27768639 72 nips-2009-Distribution Matching for Transduction

20 0.27447984 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.081), (25, 0.054), (27, 0.445), (35, 0.023), (36, 0.105), (39, 0.02), (58, 0.042), (71, 0.032), (86, 0.044), (91, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72974747 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

2 0.63191688 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

Author: Cosmin Bejan, Matthew Titsworth, Andrew Hickl, Sanda Harabagiu

Abstract: We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluated these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.

3 0.51486909 205 nips-2009-Rethinking LDA: Why Priors Matter

Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach

Abstract: Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1

4 0.37712944 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

5 0.36910528 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

6 0.33591416 166 nips-2009-Noisy Generalized Binary Search

7 0.32913318 239 nips-2009-Submodularity Cuts and Applications

8 0.32780081 45 nips-2009-Beyond Convexity: Online Submodular Minimization

9 0.32775772 129 nips-2009-Learning a Small Mixture of Trees

10 0.32639983 122 nips-2009-Label Selection on Graphs

11 0.32586047 180 nips-2009-On the Convergence of the Concave-Convex Procedure

12 0.32510361 72 nips-2009-Distribution Matching for Transduction

13 0.32451111 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

14 0.32447106 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

15 0.32257661 71 nips-2009-Distribution-Calibrated Hierarchical Classification

16 0.32246754 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

17 0.32197627 220 nips-2009-Slow Learners are Fast

18 0.32195309 193 nips-2009-Potential-Based Agnostic Boosting

19 0.32171121 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

20 0.32047731 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields