jmlr jmlr2012 jmlr2012-36 knowledge-graph by maker-knowledge-mining

36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Source: pdf

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

Abstract: In this paper we study the problem of designing SVM classiﬁers when the kernel matrix, K, is affected by uncertainty. Speciﬁcally K is modeled as a positive afﬁne combination of given positive semi deﬁnite kernels, with the coefﬁcients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classiﬁcation problems, IP methods become intractable and one has to resort to ﬁrst-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simpliﬁed version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T 2 ) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method signiﬁcantly. Keywords: robust optimization, uncertain classiﬁcation, kernel functions

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. [sent-18, score-0.446]

2 A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method signiﬁcantly. [sent-21, score-0.354]

3 Keywords: robust optimization, uncertain classiﬁcation, kernel functions 1. [sent-22, score-0.241]

4 , 2011) at designing robust classiﬁers have been limited to the case of linear classiﬁcation where the uncertainty is speciﬁed over an explicitly stated feature map. [sent-45, score-0.23]

5 Consider the problem of automated protein structure classiﬁcation, an important problem of Computational Biology, where no such feature map is available. [sent-46, score-0.173]

6 Protein Structures are speciﬁed by a set of 3D coordinates and it is possible to design kernel functions for protein structures based on the coordinates (Qiu et al. [sent-47, score-0.292]

7 , 2010) initiated a study of designing robust classiﬁers when the entries of the kernel matrix are independently distributed random variables (a somewhat problematic assumption). [sent-52, score-0.185]

8 The uncertainty in the kernel matrix K is modeled by a bounded convex set, which encompasses several possible realizations of K. [sent-59, score-0.175]

9 This new approach results ﬁrst in a robust counterpart of the uncertain SVM which can be cast as a Conic Quadratic (CQ) problem. [sent-60, score-0.176]

10 Our main contribution here is to reformulate the robust counterpart as a saddle point problem. [sent-63, score-0.224]

11 Due to favorable conditions satisﬁed by the saddle function one can in principle refer to a gradient-based general scheme introduced in (Juditski and Nemirovskii, 2011) for solving such saddle point problems. [sent-64, score-0.27]

12 Experimental results performed on synthetic data, as well as real-world protein structure data sets, show that the saddle-point based algorithm outperforms the IP method considerably. [sent-66, score-0.173]

13 To motivate the paper we start with a brief discussion on issues underlying protein structure classiﬁcation and kernel based classiﬁers in Section 2. [sent-69, score-0.238]

14 In Section 5 we present the saddle point algorithm and discuss 2924 U NCERTAIN K ERNEL M ATRICES its application to the minimax problem. [sent-74, score-0.135]

15 Motivation: Uncertain Kernels and Automated Protein Structure Classiﬁcation Classiﬁcation of protein structures into various classes like families, superfamilies etc remains an important research challenge in computational biology (see Holm and Sander, 1996 for an introduction). [sent-88, score-0.282]

16 Usually a protein structure is speciﬁed by the positions of alpha carbon (Cα ) atoms. [sent-92, score-0.173]

17 A formal description of Cα atoms and protein structures is beyond the scope of the paper and we refer the reader to Branden and Tooze (1999) for an introduction. [sent-93, score-0.266]

18 In the sequel we will denote protein structure by a set P = {ci ∈ R3 |i = 1, · · · , s}, (3) where each Cα atom is determined by spatial coordinates ci = {ci1 , ci2 , ci3 } obtained by X-ray crystallography. [sent-94, score-0.173]

19 Biologists often determine the similarity between a pair of structures by ﬁrst computing an alignment and then measuring the quality of the alignment by root mean square deviation(RMSD). [sent-99, score-0.174]

20 All such procedures implicitly assume that the protein structures are speciﬁed exactly, that is, the location of the atoms constituting the structure is known precisely. [sent-105, score-0.266]

21 1 For a protein structure P, the resolution information r, speciﬁes the error in each coordinate. [sent-107, score-0.213]

22 More formally the position of the ith atom in a protein ¯ structure P (see (3)) could be anywhere in the uncertainty box {c| c − ci ∞ ≤ r}, around the value ¯ i . [sent-108, score-0.283]

23 (4) B EN -TAL , B HADRA , B HATTACHARYYA AND N EMIROVSKI Figure 1: (a) Pictorial presentation of Cα atoms of protein d1vsra1(top) and d1gefa1(bottom). [sent-120, score-0.212]

24 , n} c as the nominal structure and U(P) as the uncertainty set associated with it. [sent-126, score-0.338]

25 The structural alignment between P and P′ in presence of uncertainty sets U(P) and U(P′ ) is not deﬁned anymore. [sent-128, score-0.17]

26 Even when r is small, the alignment scores between two nominal structures, ¯ ¯ P and P′ can differ signiﬁcantly from the alignment scores between an arbitrary R and R′ where R ∈ U(P), R′ ∈ U(P′ ). [sent-129, score-0.348]

27 This difference in alignment scores leads to uncertain kernel values. [sent-130, score-0.212]

28 For example, consider two proteins2 d1vsra1(denote it by P) and d1gefa1(denote it by P′ ) ˚ belonging to protein superfamily Restriction endonuclease-like. [sent-131, score-0.173]

29 If one ignores the uncertainty one obtains a kernel value of 1. [sent-138, score-0.175]

30 On randomly sampled structures, from the corresponding uncertainty box (4) we observe that the kernel value ranged from 1. [sent-141, score-0.175]

31 In superfamily Restriction endonuclease-like, more than 60% of the kernel values, computed between any pair of nominal protein structures, lie between Kmin and Kmax . [sent-145, score-0.466]

32 2926 U NCERTAIN K ERNEL M ATRICES This demonstrates that accounting for resolution information leads to considerable uncertainty in kernel values. [sent-149, score-0.215]

33 (2010), the uncertainty is modeled by independent noise in each of the entries of the kernel matrix K. [sent-157, score-0.175]

34 To constitute a valid characterization of uncertainty set the constraint (6) needs to be modiﬁed as follows + Prob α⊤Y (K + Z)Y α ≤ t ≥ 1 − ε, K + Z ∈ Sn The formulation (5) solves a relaxed version of the above problem by ignoring the psd requirement. [sent-180, score-0.167]

35 Thirdly, the assumption that entries of Z are independently distributed is extremely unrealistic; often the uncertainty in the entries are due to uncertainty in the observations hence K(xi , x j ) is seldom independent of K(xi , xl ) for distinct i, j, l. [sent-182, score-0.22]

36 Afﬁne Uncertainty Set Model for Uncertain Kernel Matrices In this section we introduce an uncertainty set over psd matrices and study the resultant robust SVM problem using an RO approach. [sent-187, score-0.258]

37 In the RO framework the information related to Ψ is modelled as a geometric uncertainty set E ⊂ Rk and the family of problems, (8) is replaced by its robust counterpart: r∗ = min max f (x, Ψ) x Ψ∈E (9) gi (x, Ψ) ≤ 0 ∀Ψ ∈ E i = 1, . [sent-194, score-0.199]

38 A general representation of E is as follows L ¯ E = {Ψ = Ψ + ∑ ηi Ψi | η ≤ ρ} i=1 ¯ where Ψ is the nominal value of the uncertain vector Ψ, the vectors Ψi are possible scenarios of it, and η is a perturbation vector. [sent-202, score-0.315]

39 We assume that if z and z′ are noisy observations with uncertainty sets U(z) and U(z′ ) with nominal values znom and z′ respectively then nom K(z, z′ ) = K(znom , z′ ), deﬁnes a kernel function and will be called the nominal kernel. [sent-212, score-0.631]

40 The differnom ence between actual and the nominal kernel is expressed by a linear combination of known L kernel functions, Kl , l = 1, . [sent-213, score-0.358]

41 We impose the uncertainty set (10) to all examples of interest which immediately leads to the following model of uncertainty on the kernel matrix corresponding to the training set, L E (κ) = {K = K + ∑ ηl Kl , η l=1 p ≤ κ ηl ≥ 0, l = 1, . [sent-227, score-0.285]

42 As any K ∈ E (κ) is always positive semi-deﬁnite, the set E (κ) deﬁnes a valid model for describing uncertainty in psd matrices. [sent-232, score-0.14]

43 In a later subsection we will discuss the relevance of this setup to protein structure classiﬁcation problem. [sent-233, score-0.173]

44 3 Evaluation of Base Kernels ¯ Recall that in the protein structure classiﬁcation problem each observation is speciﬁed by a (P,U(P), y), where P(see (3)) is the nominal structure, U(P)(see (4)) is the uncertainty set speciﬁed by the resolution and y is the label. [sent-266, score-0.551]

45 To closely parallel the protein structure classiﬁcation setting we consider the following setup. [sent-268, score-0.173]

46 , m} where an observation, xi , is not directly speciﬁed, instead a nominal value, xi , and an uncertainty set Ui are given. [sent-272, score-0.338]

47 An Algorithm for a Special Class of Convex-Concave Saddle Point Problems In this section we describe a novel algorithm, which is essentially a special case of an algorithm presented in Juditski and Nemirovski (2011), for a class of convex-concave saddle point problems. [sent-292, score-0.135]

48 y Y ≤1 y∈Y We are interested in solving a saddle point problem SadVal = min max φ(x, y), x∈X y∈Y where φ(·, ·) satisﬁes the following assumptions: 2931 (14) B EN -TAL , B HADRA , B HATTACHARYYA AND N EMIROVSKI A. [sent-313, score-0.135]

49 · Y , function on Y , and min φ(x) = SadVal = max φ(y) x∈X y∈Y and the set of saddle points of φ on X ×Y is X∗ × {y∗ }, where X∗ = ArgminX φ, and y∗ = ArgmaxY φ. [sent-327, score-0.135]

50 Finally, we denote by εsad (z), z ∈ Z, the natural saddle point proximity measure: εsad (x, y) = φ(x) − φ(y) = φ(x) − min φ + max φ − φ(y) . [sent-328, score-0.135]

51 2 Fixed Step-size per Stage(FSS) Algorithm for Convex-Concave Saddle Point Problem The MPb algorithm presented in Juditski and Nemirovski (2011) is an extremely fast algorithm for convex-concave saddle point problems. [sent-330, score-0.135]

52 1 FSS A LGORITHM We begin by introducing some notation G(x, y) = Gx (x, y) := ∂φ(x, y) ∂φ(x, y) = a(y); Gy (x, y) := − : Z := X ×Y → Z := X × Y ∂x ∂y be the monotone operator associated with the saddle point problem (14). [sent-339, score-0.135]

53 we have at our disposal positive Rs and a point ys ∈ Y such that ¯ ys − y∗ Y ≤ Rs /2. [sent-348, score-0.674]

54 Initialization: We set z1,s = (xω , ys ) = argminZ ωs (·). [sent-351, score-0.337]

55 When t = Ns , we deﬁne the approximate solution to (14) built in course of s stages as Ns (xs , ys ) = Ns−1 ∑ wt,s , (18) t=1 set ys+1 = ys , Rs+1 = Rs /2 ¯ and pass to stage s + 1. [sent-357, score-0.706]

56 ¯ Then, for every s, (Is ) takes place, and εsad (xs , ys ) ≤ θR2 2−2s−5 , 0 (19) while the total number Ms = ∑s Ni of steps of the algorithm needed to build (xs , ys ) admits the i=0 bound Lxy ΩX ΩY s Lyy ΩY + θ 2 + (s + 1) . [sent-364, score-0.674]

57 O(1) θR0 s ≤ s∗ ⇒ εsad (xs , ys ) ≤ θR2 2−2(s+2) & Ms ≤ O(1) 0 s > s∗ ⇒ εsad (xs , ys ) ≤ O(1) 2 Lxy ΩX ΩY 2 θMs & Ms ≤ (20) Proof See Appendix A. [sent-366, score-0.674]

58 In order to be consistent with the notation of FSS procedure we deﬁne the following map where LHS denotes quantities involving formulation (12) and the right hand side corresponds to the saddle point procedure detailed in the previous section. [sent-375, score-0.162]

59 However, the situation still allows to achieve O(1/T 2 ) convergence rate by applying the FSS algorithm to the saddle point reformulation of (24). [sent-412, score-0.135]

60 The prototype algorithm MP solves saddle point problem (24) with convexconcave and smooth (with Lipschitz continuous gradient) φ at the rate O(1/T ), with the hidden factor in O(·) depending on the distance from the starting point to the solution set of (5. [sent-414, score-0.135]

61 Now, when φ is strongly concave in y, the above convergence implies qualiﬁed convergence of the y-components yt of approximate solutions to the y-component y∗ of the saddle point of φ, so that eventually we know that yt − y∗ Y is, say, twice smaller than (a priori upper bound R on) y1 − y∗ Y . [sent-422, score-0.238]

62 1 Prediction Rules For each observation, xt , the values, K(xt , xi ), are only approximately known and lies in an uncertainty set (10). [sent-436, score-0.15]

63 In other words the classiﬁer is robust to uncertainty in the value of K. [sent-440, score-0.199]

64 The simplest case would be to use K(xt , xi ) = K(xt , xi ), which when used in conjunction with (1) gives the following labelling rule: ytpr = sign ¯ ∑ yi αi K(xt , xi ) + b , (25) i which, in the sequel, will be referred to as the nominal rule. [sent-441, score-0.294]

65 l=1 One option for arriving at a label would be to take the majority vote with the above kernel function, ytpr = sign R ∑ yts , yts = sign ∑ αi yi Kts (xt , xi ) + b . [sent-448, score-0.237]

66 (26) i s=1 Once we have deﬁned these two prediction rule , namely majority vote and the nominal rule, it is important to devise measures for evaluating the resultant classiﬁers. [sent-449, score-0.289]

67 For the nominal classiﬁer (25) the usual 0/1 loss works well and we deﬁne n tst ∑t=1 1(ytpr =yt ) ¯ NominalErr(NE) = . [sent-456, score-0.255]

68 ntst Similarly for the majority vote based classiﬁer (26), we deﬁne n MajorityErr(ME) = tst ∑t=1 1(ytpr =yt ) ntst (27) where, yt is the true label for xt . [sent-457, score-0.212]

69 The Nominal − SVM formulation is the usual SVM formulation (2) with the nominal kernel. [sent-471, score-0.282]

70 Additionally we also have empirically tested them on protein structure data. [sent-494, score-0.173]

71 We study the problem of robust classiﬁcation when the kernel values are not available but are governed by (10). [sent-518, score-0.154]

72 , n} where Pi is the nominal structure described in (3) with label yi . [sent-548, score-0.257]

73 Incorporation of resolution information ri leads to uncertainty sets U(Pi ) (see (4)). [sent-549, score-0.15]

74 (2007) and assuming that the resultant uncertainty in kernel values obey (10) the kernel functions K, Kl are computed by the procedure outlined in Section 4. [sent-551, score-0.269]

75 However we still provide a comparison to the robust formulations described in this paper for the sake of completion. [sent-554, score-0.135]

76 When κ = 0, then there is no uncertainty and as it increases the uncertainty becomes more pronounced. [sent-571, score-0.22]

77 The utility of robust formulations would become clear as κ is increased. [sent-572, score-0.135]

78 This shows that, non-robust classiﬁers, for example, SVM, are unable to handle uncertainty compared to the proposed robust classiﬁers. [sent-593, score-0.199]

79 This experiment demonstrates that in the presence of uncertainty the performance of extremely accurate classiﬁers suffer drastically but the proposed robust formulations fare much better in handling uncertainty. [sent-626, score-0.245]

80 4 Veriﬁcation of Convergence of FSS Algorithm In this section, we have experimentally veriﬁed that the proposed saddle point based algorithm has 1 O( M2 ) convergence rate (see (20)). [sent-629, score-0.135]

81 This concludes that, to build a robust classiﬁer with a medium scale of data (even more than 1000) the saddle point based algorithm is much more effective then a Quadratic Conic Program based formulation. [sent-656, score-0.224]

82 All the three formulations are more robust than Nominal − SVM. [sent-659, score-0.135]

83 The experimental methodology follows “one-vs-one” classiﬁcation setting with all 15 classes of protein structures. [sent-680, score-0.173]

84 Let D = {(Pi , ri , yi )} be a protein structure data sets where Pi is the set of coordinates of ith protein structure obtained from Astral7 database, where ri is the corresponding resolution information obtained from the PDB, and yi is the class label. [sent-683, score-0.444]

85 One can create a set i 2 2 of uncertain kernels, where K(p, p′ ) is a kernel function computed between two protein structures p ∈ Qi and p′ ∈ Q j . [sent-689, score-0.379]

86 These kernels are purely based on protein structure (specially position of cα ). [sent-698, score-0.207]

87 P ROTEIN S TRUCTURE C LASSIFICATION Table 2 and Table 3 report results for RSVM, USSVMMN and Nominal − SVM (SVM with kernels based on nominal protein structure reported in PDB ﬁles) using both standard and robust error measures deﬁned in section 6. [sent-703, score-0.524]

88 Conclusion We studied the problem of designing robust classiﬁers when the kernel matrices are uncertain. [sent-722, score-0.185]

89 We adapt the general purpose algorithm of Nemirovski (2004) for solving saddle point procedure to this problem. [sent-878, score-0.135]

90 Empirical results show that USSVMSOCP is indeed a robust alternative to uncertainty in the kernel matrices both on synthetic and real world data sets. [sent-887, score-0.264]

91 for Z compatible with the norm · , and (a) argminZ ωs (·) = (xω , ys ) ∈ Zs & maxZs ωs (·) − minZs ωs (·) ≤ 1, ¯ (b) ∀(z, z′ ∈ Z) : G(z) − G(z′ ) ∗ ≤ Ls z − z′ . [sent-902, score-0.337]

92 Indeed, assuming the opposite, let Ys = {y ∈ Y : y − ys Y ≤ ¯ ¯ Rs }, and let y = argmaxy∈Ys φ(xs , y), so that φs (xs ) = φ(xs , y). [sent-948, score-0.337]

93 Since φ(xs ) := max φ(xs , y) > φs (xs ) := ¯ y∈Y ¯ ¯ ¯ max φ(xs , y) and Ys is cut off Y by the inequality y − ys Y ≤ Rs , we have y − ys Y = Rs , while by y∈Y s (Is ) we have y∗ − ys Y ≤ Rs /2, whence, in particular, y∗ ∈ Ys and y∗ − ys Y ≥ Rs /2. [sent-949, score-1.348]

94 · Y , and attains its maximum over y ∈ Y at y∗ , we have φ(y∗ ) − φ(ys ) ≥ θ y∗ − ys 2 , which combines with (36) to imply that y∗ − Y 2 ys Y ≤ Rs /4 = Rs+1 /2; this is nothing but (Is+1 ). [sent-960, score-0.674]

95 It is easy to verify that min(− maxi+ (yi + qi ), mini− (yi + ¯ ¯ qi ) −C) ≤ ν ≤ max(C − mini+ (yi + qi ), maxi− (yi + qi )). [sent-970, score-0.176]

96 Kernel Functions for Protein Structures Experiments on protein structures have been conducted with Weighted Pairwise Distance Substructure Kernel described in Bhattacharya et al. [sent-975, score-0.227]

97 A substructure Nia consists of l spatially nearest residues to the ath residue of protein Pi . [sent-980, score-0.247]

98 The substructure kernel between two substructures Nia and N jb is deﬁned as Kpds (Nia , N jb ) = ∑ e − dia −π(d j ) 2 b σ2 π∈∏(l) where dia denotes set of pairwise distance betwen all possible pair of residues in Nia . [sent-981, score-0.139]

99 Finally the kernel function between two protein structures is deﬁned as ni K(Pi , Pj ) = nj ∑ ∑ Kpds (Nia , N jc )Kpds (Nib , N jd )Knorm (ia , ib , jc , jd ) a,b=1 c,d=1 − where Knorm (ia , ib , jc , jd ) = e ( cia −ci − c jc −c j )2 b d σ2 . [sent-987, score-0.396]

100 Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. [sent-1122, score-0.135]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ys', 0.337), ('ussvmmn', 0.33), ('ussvmsocp', 0.321), ('fss', 0.266), ('nominal', 0.228), ('rsvm', 0.211), ('bhadra', 0.183), ('protein', 0.173), ('lxy', 0.156), ('ns', 0.149), ('emirovski', 0.147), ('hadra', 0.147), ('hattacharyya', 0.147), ('ncertain', 0.137), ('saddle', 0.135), ('xs', 0.119), ('rs', 0.114), ('uncertainty', 0.11), ('bhattacharya', 0.101), ('atrices', 0.098), ('lyy', 0.094), ('vzs', 0.092), ('robust', 0.089), ('ernel', 0.087), ('uncertain', 0.087), ('ro', 0.082), ('ussvm', 0.082), ('socp', 0.078), ('kl', 0.075), ('svm', 0.068), ('kernel', 0.065), ('sadval', 0.064), ('sad', 0.063), ('en', 0.062), ('alignment', 0.06), ('proxz', 0.055), ('superfamilies', 0.055), ('bhattacharyya', 0.055), ('zs', 0.055), ('gy', 0.054), ('structures', 0.054), ('formulations', 0.046), ('gx', 0.046), ('nemirovski', 0.046), ('errormeasure', 0.046), ('juditski', 0.046), ('nia', 0.046), ('robusterr', 0.046), ('ky', 0.046), ('qi', 0.044), ('ms', 0.043), ('robustness', 0.042), ('xt', 0.04), ('resolution', 0.04), ('atoms', 0.039), ('yt', 0.039), ('sn', 0.039), ('classi', 0.038), ('mpb', 0.037), ('ntst', 0.037), ('residues', 0.037), ('substructure', 0.037), ('ussv', 0.037), ('ytpr', 0.037), ('yts', 0.037), ('ql', 0.035), ('kernels', 0.034), ('qiu', 0.033), ('stage', 0.032), ('majority', 0.032), ('holm', 0.031), ('scop', 0.031), ('argminx', 0.031), ('modulus', 0.031), ('ip', 0.031), ('designing', 0.031), ('vs', 0.031), ('psd', 0.03), ('du', 0.029), ('chance', 0.029), ('resultant', 0.029), ('db', 0.029), ('ls', 0.029), ('yi', 0.029), ('zl', 0.028), ('aharon', 0.027), ('argminy', 0.027), ('kpds', 0.027), ('rmsd', 0.027), ('sahely', 0.027), ('sedumi', 0.027), ('shindyalov', 0.027), ('tst', 0.027), ('formulation', 0.027), ('al', 0.027), ('pi', 0.027), ('jc', 0.026), ('dg', 0.026), ('beta', 0.026), ('concave', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

2 0.087345466 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-deﬁnite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efﬁciently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufﬁcient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the inﬂuence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, inﬂuence function, M-estimation

3 0.068058684 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

Author: Fei Yan, Josef Kittler, Krystian Mikolajczyk, Atif Tahir

Abstract: Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ℓ p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-inﬁnite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to signiﬁcant improvements in the efﬁciency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ℓ p MK-FDA, ﬁxed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ℓ p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ℓ p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the uniﬁed framework of regularised kernel machines. Keywords: multiple kernel learning, kernel ﬁsher discriminant analysis, regularised least squares, support vector machines

4 0.058510631 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difﬁcult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classiﬁcation and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efﬁcient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classiﬁcation and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classiﬁcation and regression. Keywords: kernel methods, learning kernels, feature selection

5 0.058060091 97 jmlr-2012-Regularization Techniques for Learning with Matrices

Author: Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari

Abstract: There is growing body of learning problems for which it is natural to organize the parameters into a matrix. As a result, it becomes easy to impose sophisticated prior knowledge by appropriately regularizing the parameters under some matrix norm. This work describes and analyzes a systematic method for constructing such matrix-based regularization techniques. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on a known duality phenomenon: a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and multiple kernel learning. Keywords: regularization, strong convexity, regret bounds, generalization bounds, multi-task learning, multi-class learning, multiple kernel learning

6 0.052938741 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

7 0.050252274 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

8 0.044974253 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors

9 0.043592285 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

10 0.043049067 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

11 0.040071648 91 jmlr-2012-Plug-in Approach to Active Learning

12 0.039204802 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

13 0.038370278 12 jmlr-2012-Active Clustering of Biological Sequences

14 0.037921909 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

15 0.037044987 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

16 0.036125083 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization

17 0.03566888 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

18 0.033322789 44 jmlr-2012-Feature Selection via Dependence Maximization

19 0.032825068 80 jmlr-2012-On Ranking and Generalization Bounds

20 0.031420924 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.168), (1, -0.032), (2, 0.048), (3, 0.102), (4, 0.056), (5, -0.003), (6, -0.055), (7, 0.016), (8, -0.02), (9, -0.002), (10, 0.013), (11, -0.06), (12, -0.041), (13, -0.008), (14, 0.047), (15, 0.102), (16, 0.051), (17, -0.05), (18, 0.047), (19, -0.002), (20, -0.007), (21, 0.044), (22, 0.073), (23, -0.015), (24, 0.044), (25, -0.055), (26, 0.016), (27, 0.012), (28, 0.068), (29, 0.236), (30, 0.176), (31, 0.175), (32, 0.214), (33, -0.211), (34, -0.069), (35, -0.079), (36, -0.211), (37, 0.015), (38, 0.025), (39, -0.277), (40, -0.176), (41, -0.162), (42, 0.06), (43, 0.109), (44, -0.094), (45, 0.09), (46, -0.081), (47, 0.125), (48, -0.112), (49, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91524953 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

2 0.61595529 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

3 0.40386465 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors

Author: Emilio Parrado-Hernández, Amiran Ambroladze, John Shawe-Taylor, Shiliang Sun

Abstract: This paper presents the prior PAC-Bayes bound and explores its capabilities as a tool to provide tight predictions of SVMs’ generalization. The computation of the bound involves estimating a prior of the distribution of classiﬁers from the available data, and then manipulating this prior in the usual PAC-Bayes generalization bound. We explore two alternatives: to learn the prior from a separate data set, or to consider an expectation prior that does not need this separate data set. The prior PAC-Bayes bound motivates two SVM-like classiﬁcation algorithms, prior SVM and ηprior SVM, whose regularization term pushes towards the minimization of the prior PAC-Bayes bound. The experimental work illustrates that the new bounds can be signiﬁcantly tighter than the original PAC-Bayes bound when applied to SVMs, and among them the combination of the prior PAC-Bayes bound and the prior SVM algorithm gives the tightest bound. Keywords: PAC-Bayes bound, support vector machine, generalization capability prediction, classiﬁcation

4 0.39134002 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

5 0.300751 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

Author: Fei Yan, Josef Kittler, Krystian Mikolajczyk, Atif Tahir

6 0.27468863 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

7 0.26102594 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

8 0.25770107 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training

9 0.24998334 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods

10 0.23078842 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

11 0.22967386 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization

12 0.22225028 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

13 0.19728062 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

14 0.18834975 97 jmlr-2012-Regularization Techniques for Learning with Matrices

15 0.17842551 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization

16 0.1749139 91 jmlr-2012-Plug-in Approach to Active Learning

17 0.16844063 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

18 0.16783206 44 jmlr-2012-Feature Selection via Dependence Maximization

19 0.16552053 109 jmlr-2012-Stability of Density-Based Clustering

20 0.1568092 102 jmlr-2012-Sally: A Tool for Embedding Strings in Vector Spaces

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.026), (21, 0.034), (26, 0.04), (29, 0.03), (35, 0.04), (49, 0.016), (56, 0.018), (57, 0.014), (64, 0.02), (69, 0.021), (75, 0.074), (77, 0.012), (79, 0.016), (88, 0.385), (92, 0.067), (96, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.68981236 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

2 0.37580526 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

Author: Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Abstract: Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem. Keywords: distributed computing, online learning, stochastic optimization, regret bounds, convex optimization

3 0.3750183 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identiﬁcation of an appropriate distance metric on the input data. It is often beneﬁcial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semideﬁnite. Semideﬁnite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semideﬁnite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semideﬁnite matrices as weak learners within an efﬁcient and scalable boosting-based learning process. The resulting methods are easy to implement, efﬁcient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semideﬁnite matrix with trace and rank being one rather than a classiﬁer or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classiﬁcation accuracy and running time. Keywords: Mahalanobis distance, semideﬁnite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor

4 0.37383139 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

Author: Mehrdad Mahdavi, Rong Jin, Tianbao Yang

Abstract: In this paper we propose efﬁcient algorithms for solving constrained online convex optimization problems. Our motivation stems from the observation that most algorithms proposed for online convex optimization require a projection onto the convex set K from which the decisions are made. While the projection is straightforward for simple shapes (e.g., Euclidean ball), for arbitrary complex sets it is the main computational challenge and may be inefﬁcient in practice. In this paper, we consider an alternative online convex optimization problem. Instead of requiring that decisions belong to K for all rounds, we only require that the constraints, which deﬁne the set K , be satisﬁed in the long run. By turning the problem into an online convex-concave optimization problem, √ we propose an efﬁcient algorithm which achieves O( T ) regret bound and O(T 3/4 ) bound on the violation of constraints. Then, we modify the algorithm in order to guarantee that the constraints are satisﬁed in the long run. This gain is achieved at the price of getting O(T 3/4 ) regret bound. Our second algorithm is based on the mirror prox method (Nemirovski, 2005) to solve variational inequalities which achieves O(T 2/3 ) bound for both regret and the violation of constraints when the domain K can be described by a ﬁnite number of linear constraints. Finally, we extend the results to the setting where we only have partial access to the convex set K and propose a multipoint bandit feedback algorithm with the same bounds in expectation as our ﬁrst algorithm. Keywords: online convex optimization, convex-concave optimization, bandit feedback, variational inequality

5 0.37352058 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan

Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efﬁcient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efﬁcient, crowdsourcing

6 0.37302309 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

7 0.37094712 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

8 0.36823756 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

9 0.36789858 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

10 0.36243361 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach

11 0.36231229 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

12 0.36185807 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

13 0.36116296 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

14 0.35903245 103 jmlr-2012-Sampling Methods for the Nyström Method

15 0.35871336 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

16 0.35700262 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

17 0.35558823 13 jmlr-2012-Active Learning via Perfect Selective Classification

18 0.35542426 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

19 0.3547166 80 jmlr-2012-On Ranking and Generalization Bounds

20 0.35452318 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment