nips nips2012 nips2012-15 knowledge-graph by maker-knowledge-mining

15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification


Source: pdf

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A provably efficient simplex algorithm for classification Elad Hazan ∗ Technion - Israel Inst. [sent-1, score-0.429]

2 com Abstract We present a simplex algorithm for linear programming in a linear classification formulation. [sent-8, score-0.551]

3 We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. [sent-10, score-1.27]

4 This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. [sent-11, score-0.455]

5 The simplex algorithm for linear programming is a cornerstone of operations research. [sent-13, score-0.547]

6 As of today, it is unknown whether a variant of the simplex algorithm (defined by a pivot rule) exists which makes it run in strongly polynomial time. [sent-15, score-0.917]

7 Further, the simplex algorithm, being a geometrical algorithm that is applied to polytopes defined by linear programs, relates to deep questions in geometry. [sent-16, score-0.51]

8 Perhaps the most famous of which is the “polynomial Hirsh conjecture”, that states that the diameter of a polytope is polynomial in its dimension and the number of its facets. [sent-17, score-0.408]

9 In this paper we analyze a simplex-based algorithm which is guaranteed to run in worst-case polynomial time for large class of practically-interesting linear programs that arise in machine learning, namely linear classification problems. [sent-18, score-0.25]

10 Further, our simplex algorithm performs only a polylogarithmic number of pivot steps and overall near linear running time. [sent-19, score-0.99]

11 The only previously known poly-time simplex algorithm performs a polynomial number of pivot steps [KS06]. [sent-20, score-0.943]

12 1 Related work The simplex algorithm for linear programming was invented by Danzig [Dan51]. [sent-22, score-0.507]

13 In the sixty years that have passed, numerous attempts have been made to devise a polynomial time simplex algorithm. [sent-23, score-0.51]

14 Various authors have proved polynomial bounds on the number of pivot steps required by simplex variants for inputs that are generated by various distributions, see e. [sent-24, score-0.919]

15 A breakthrough in the theoretical analysis of the simplex algorithm was obtained by Spielman and Teng [ST04], who have shown that its smoothed complexity is polynomial, i. [sent-28, score-0.524]

16 Kelner and Spielman [KS06] have used similar techniques to provide for a worst-case polynomial time simplex algorithm. [sent-31, score-0.51]

17 The purpose of this paper is to expand our understanding of the simplex method, rather than obtain a more efficient algorithm for classification. [sent-36, score-0.429]

18 there exists a separating hyperplane, if and only if the above linear program has a feasible solution. [sent-53, score-0.26]

19 The margin of a linear program in format (1) , such that ∀i Ai ≤ 1, is defined as λ = λ(A) = max min Ai , x x ≤1 i∈[n] We say that the instance A is a λ-margin LP. [sent-58, score-0.451]

20 Notice that we have restricted x as well as the rows of A to have bounded norm, since otherwise the margin is ill-defined as it can change by scaling of x. [sent-59, score-0.328]

21 While any linear program can be converted to an equivalent one in form (1), the margin can be exponentially small in the representation. [sent-61, score-0.398]

22 However, in practical applications the margin is usually a constant independent of the problem dimensions; a justification is given next. [sent-62, score-0.294]

23 Therefore we henceforth treat the margin as a separate parameter of the linear program, and devise efficient algorithms for solving it when the margin is a constant independent of the problem dimensions. [sent-63, score-0.658]

24 Support vector machines - why is the margin large ? [sent-64, score-0.294]

25 2 Linear Programming and Smoothed analysis Smoothed analysis was introduced in [ST04] to explain the excellent performance of the simplex algorithm in practice. [sent-77, score-0.429]

26 In their seminal paper, Spielman and Teng proved the existence of a simplex algorithm that solves σ-smooth LP in polynomial time (polynomial also in σ −1 ). [sent-79, score-0.534]

27 3 Statement of our results For a separable SVM instance of n variables in a space of d dimensions and margin λ, we provide a simplex algorithm with at most poly(log(n), λ−1 ) many pivot steps. [sent-83, score-1.208]

28 The algorithm achieves a solution with margin O( log(n)/d) when viewed as a separator in the d dimensional space. [sent-85, score-0.503]

29 However, in an alternative yet (practically) equivalent view, the margin of the solution is in fact arbitrarily close to λ. [sent-86, score-0.357]

30 Let L be a separable 2 -SVM instance of dimension d with n examples and margin λ. [sent-88, score-0.459]

31 The simplex algorithm presented in this paper requires O(nd) preprocessing time and poly(ε−1 , log(n)) pivot steps. [sent-91, score-0.812]

32 The margin of the solution when viewed as a hyperplane in Rd is O( log(n)/d). [sent-93, score-0.469]

33 When projecting the data points onto V , the margin of the solution is λ−ε. [sent-94, score-0.447]

34 In words, the above theorem states that when viewed as a classification problem the obtained margin is almost optimal. [sent-95, score-0.365]

35 Tightness of the Generalization Bound In first sight it seems that our result gives a week generalization bound since the margin obtained in the original dimension is low. [sent-97, score-0.382]

36 However, the margin of the found solution in the reduced dimension (i. [sent-98, score-0.455]

37 LP perspective and the smoothed analysis framework As mentioned earlier, any linear program can be viewed as a classification LP by introducing a single new variable. [sent-104, score-0.232]

38 Furthermore, any solution with a positive margin translates into an optimal solution to the original LP. [sent-105, score-0.42]

39 However, in the perspective of a general LP solver1 , the solution is optimal as any positive margin suffices. [sent-107, score-0.357]

40 It stands to reason that in many practical settings the margin of the solution is constant or polylogarithmically small at worst. [sent-108, score-0.357]

41 In such cases, our simplex algorithm solves the LP by using at most a polylogarithmic number of pivot steps. [sent-109, score-0.886]

42 We further mention that without the large margin assumption, in the smoothed analysis framework it is known ([BD02], Lemma 6. [sent-110, score-0.389]

43 Hence, our algorithm runs in polynomial time in the smoothed analysis framework as well. [sent-115, score-0.224]

44 Reducing the dimension, adding artificial constraints to bound the norm of the solution, perturbing the low dimensional LP, finding a feasible point and shifting the polytope. [sent-119, score-0.421]

45 This step reduces the time complexity by reducing both the number and running time of the pivot steps. [sent-122, score-0.445]

46 In order to bound the 2 norm of the original vector, we bound the ∞ norm of the low dimensional vector. [sent-123, score-0.217]

47 This step ensures the bound on the number of pivot step performed by the simplex algorithm. [sent-130, score-0.869]

48 In order to find a feasible point we exploit the fact that when the margin is allowed to be negative, there is always a feasible solution. [sent-131, score-0.583]

49 We prove for a fixed set of constraints, one of which is a negative lower bound on the margin, that the corresponding point v0 is not only feasible but is the unique optimal solution for fixed direction. [sent-132, score-0.28]

50 The direction is independent of the added noise, which is a necessary property when bounding the number of pivot steps. [sent-133, score-0.465]

51 Since we use the shadow vertex pivot rule we must have an LP instance for which 0 is an interior point of the polytope. [sent-135, score-1.086]

52 This property is not held for our polytope as the LP contains inequalities of the form a, x ≥ 0. [sent-136, score-0.265]

53 However, we prove that both 0 and v0 are feasible solution to the LP that do not share a common facet. [sent-137, score-0.222]

54 Hence, their average is an interior point of the polytope and a shift by −v0 /2 would ensure that 0 is an interior point as required. [sent-138, score-0.511]

55 Once the preprocessing is done we solve the LP via the shadow vertex method which is guaranteed to finish after a polylogarithmic number of pivot steps. [sent-139, score-0.988]

56 Given a sufficiently small additive noise and sufficiently large target dimension we are guaranteed that the obtained solution is an almost optimal ˜ solution to the unperturbed low dimensional problem and a O( k/d) approximation to the higher dimensional problem. [sent-140, score-0.335]

57 2 The number of vertices in the shadow of a perturbed polytope A key lemma in the papers of [ST04, Ver09] is a bound on the expected number of vertices in the projection of a perturbed polytope onto a plane. [sent-150, score-1.275]

58 3 The shadow vertex method The shadow vertex method is a pivot rule used to solve LPs. [sent-161, score-1.473]

59 In order to apply it, the polytope of the LP must have 0 as an interior point. [sent-162, score-0.327]

60 The input consists of a feasible point v in the polytope and a direction u in which it is farthest, compared to all other feasible points. [sent-164, score-0.571]

61 For more on the shadow vertex method we refer the reader to [ST04], Section 3. [sent-166, score-0.531]

62 Consider an LP of the form ∀i ∈ [n] max c x Ai , x ≤ 1 When solving the LP via the shadow vertex method, the number of pivot steps is upper bounded by the number of edges in P ∩ E where P = conv(0, A1 , . [sent-168, score-0.974]

63 5 Algorithm and Analysis Our simplex variant is defined in Algorithm 1 below. [sent-172, score-0.405]

64 It is composed of projecting the polytope onto a lower dimension, adding noise, finding an initial vertex (Phase 1), shifting and applying the shadow vertex simplex algorithm [GS55]. [sent-173, score-1.514]

65 Algorithm 1 performs an expected number of O(poly(log n, λ )) pivot steps. [sent-175, score-0.383]

66 Over 1 1 ¯ instance A with λ-margin it returns, with probability at least 1 − O( k + n ), a feasible solution x √ k with margin Ω( √λ log k ). [sent-176, score-0.564]

67 Since A has margin λ, there exists x∗ ∈ Rd such that ∀i . [sent-184, score-0.294]

68 at least 1 − 1/n, ∀i, erri 2 ≤ O(σ 5 k log(n)) ≤ 1 √ 20 k Algorithm 1 large margin simplex 1: Input: a λ-margin LP instance A. [sent-206, score-0.769]

69 Let A ∈ Rn×(k+1) be given by Ai = ( d MAi , −1) k √ log √ k, k 4: (step 2: bounding x ) Add the k constraints ei , x ≥ − 6 √ log − 6 √k k , the k constraints −ei , x ≥ and one additional constraint τ ≥ −8 log(k). [sent-209, score-0.228]

70 8: (step 6 - shadow vertex simplex): Let E = span(u0 , ek+1 ). [sent-233, score-0.531]

71 Apply the shadow vertex simplex algorithm on the polygon which is the projection of conv{V}, where V = ˆ b ˜ b ˜ ˜ {0, A0 /ˆ0 , A1 /ˆ1 , . [sent-234, score-1.068]

72 for τ = λ − 2ε, the point (ˆ , τ ) ∈ Rk+1 is a feasible solution of LPnoise . [sent-250, score-0.224]

73 for every x ∈ Rk , τ ∈ R where (x, τ ) is a feasible solution of the LPnoise it holds that √ log(k) log ˜ i , (x, τ ) − Ai , (x, τ ) ≤ √ k ˆ , ∀i . [sent-252, score-0.246]

74 t to the objective maxx∈P u0 , x , where P is the polytope of LPnoise . [sent-260, score-0.24]

75 t to the objective maxx∈P u0 , x , where P is the polytope of LPnoise . [sent-270, score-0.24]

76 The set of points u in which v0 is the optimal solution is defined by a (blunt) cone ˜ ˆ { i αi ai | ∀i, αi > 0}, where ai = −An+k+i for i ∈ [k], ak+1 = −A0 . [sent-272, score-0.417]

77 The point v0 /2 is a feasible interior point of the polytope with probability at least 1 − O(1/n). [sent-283, score-0.521]

78 Since 0 is clearly a feasible point of the polytope, we get that v0 /2 is a feasible interior point as claimed. [sent-289, score-0.409]

79 We first note that in order to use the shadow vertex method, 0 must be an interior point of the polytope. [sent-291, score-0.651]

80 Indeed according to Lemma 10, v0 /2 is an interior point of the polytope, and by shifting it to 0, the shadow vertex method can indeed be implemented. [sent-293, score-0.721]

81 (4) ˜ ˜ d/kAi M x = f (Ai ), x ≥ To compute the margin of this solution, note that the rows of M consist of an orthonormal set. [sent-301, score-0.294]

82 It √ follows that the margin of the solution is at least ≥ (λ − 3ε) · k/(7 log(k)d) Running time: The number of steps in this simplex step is bounded by the number of vertices in the polygon which is the projection of the polytope of LPnoise onto the plane E = span{u0 , vT }. [sent-303, score-1.357]

83 Each pivot ˜ step in the shadow vertex simplex method can be implemented to run in time O(nk) = O(n/λ14 ) ˜ for n constraints in dimension k. [sent-310, score-1.449]

84 All other operations including adding noise and shifting the polytope are faster than the shadow vertex simplex ˜ procedure, leading to an overall running time of O(nd) (assuming λ is a constant or sub polynomial in d). [sent-312, score-1.478]

85 The statement regarding the margin of the solution, viewed as a point in Rd is immediate from Theorem 4. [sent-314, score-0.398]

86 The margin of the solution produced by the algorithm (i. [sent-317, score-0.381]

87 Hence, the margin of the normalized point ˜ ˜ x/ x 2 is Ω(λ/ log(k)). [sent-321, score-0.327]

88 In order to achieve a margin of λ − O(ε), one should replace the ∞ bound in the LP with an approximate 2 bound. [sent-322, score-0.319]

89 Once the 2-norm of x is bounded by 1 + O(ε), the margin of the normalized point is λ − O(ε). [sent-330, score-0.361]

90 6 Discussion The simplex algorithm for linear programming is a cornerstone of operations research whose computational complexity remains elusive despite decades of research. [sent-331, score-0.547]

91 In this paper we examine the simplex algorithm in the lens of machine learning, and in particular via linear classification, which is equivalent to linear programming. [sent-332, score-0.517]

92 We show that in the cases where the margin parameter is large, say a small constant, we can construct a simplex algorithm whose worst case complexity is (quasi) linear. [sent-333, score-0.746]

93 Indeed in many practical problems the margin parameter is a constant unrelated to the other parameters. [sent-334, score-0.294]

94 For example, in cases where a constant inherent noise exists, the margin must be large otherwise the problem is simply unsolvable. [sent-335, score-0.339]

95 1 soft margin SVM In the setting of this paper, the case of soft margin SVM turns out to be algorithmically easier to solve than the separable case. [sent-337, score-0.763]

96 In fact, it suffices to solve the problem with the reduced number of points, up to an additive loss of ε to the margin to obtain the same result. [sent-349, score-0.356]

97 To conclude, the soft margin SVM problem is much easier than the separable case hence we do not analyze it in this paper. [sent-354, score-0.436]

98 Improved asymptotic analysis of the average number of steps performed by the self-dual simplex algorithm. [sent-413, score-0.431]

99 Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. [sent-419, score-0.534]

100 Beyond hirsch conjecture: Walks on random polytopes and smoothed complexity of the simplex method. [sent-423, score-0.537]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('simplex', 0.405), ('pivot', 0.383), ('shadow', 0.365), ('margin', 0.294), ('polytope', 0.24), ('lp', 0.235), ('lpnoise', 0.183), ('vertex', 0.166), ('ai', 0.143), ('feasible', 0.128), ('polynomial', 0.105), ('smoothed', 0.095), ('interior', 0.087), ('lemma', 0.084), ('conv', 0.084), ('hyperplane', 0.079), ('separable', 0.078), ('rk', 0.075), ('polylogarithmic', 0.074), ('spielman', 0.07), ('shifting', 0.07), ('polygon', 0.069), ('norm', 0.065), ('solution', 0.063), ('dimension', 0.063), ('program', 0.06), ('rd', 0.058), ('poly', 0.056), ('plane', 0.055), ('log', 0.055), ('onto', 0.054), ('separator', 0.052), ('vertices', 0.05), ('perturbed', 0.048), ('erri', 0.046), ('errn', 0.046), ('kelner', 0.046), ('lpbounded', 0.046), ('mai', 0.046), ('classi', 0.045), ('noise', 0.045), ('linear', 0.044), ('direction', 0.042), ('cornerstone', 0.04), ('bounding', 0.04), ('projection', 0.039), ('svm', 0.039), ('constraints', 0.039), ('statement', 0.038), ('theorem', 0.038), ('bi', 0.037), ('nutshell', 0.037), ('paramount', 0.037), ('polytopes', 0.037), ('dimensional', 0.037), ('elementary', 0.036), ('points', 0.036), ('teng', 0.035), ('resides', 0.035), ('reduced', 0.035), ('running', 0.034), ('programming', 0.034), ('johnson', 0.034), ('bounded', 0.034), ('soft', 0.033), ('point', 0.033), ('programs', 0.033), ('viewed', 0.033), ('papers', 0.032), ('cone', 0.032), ('shift', 0.031), ('easier', 0.031), ('prove', 0.031), ('format', 0.029), ('polynomially', 0.029), ('maxx', 0.028), ('separating', 0.028), ('step', 0.028), ('rule', 0.028), ('elad', 0.027), ('additive', 0.027), ('cation', 0.026), ('ek', 0.026), ('feasibility', 0.026), ('haifa', 0.026), ('henceforth', 0.026), ('steps', 0.026), ('norms', 0.026), ('multiplicative', 0.026), ('vectors', 0.026), ('focs', 0.025), ('alternatively', 0.025), ('inequalities', 0.025), ('bound', 0.025), ('adding', 0.024), ('instance', 0.024), ('proof', 0.024), ('algorithm', 0.024), ('sub', 0.024), ('worst', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

2 0.27137554 227 nips-2012-Multiclass Learning with Simplex Coding

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the first provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

3 0.19865397 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

Author: Kosuke Fukumasu, Koji Eguchi, Eric P. Xing

Abstract: Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models. 1

4 0.1608263 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

5 0.13886219 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

6 0.11076834 204 nips-2012-MAP Inference in Chains using Column Generation

7 0.1023732 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

8 0.098581158 208 nips-2012-Matrix reconstruction with the local max norm

9 0.096073933 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

10 0.091804422 260 nips-2012-Online Sum-Product Computation Over Trees

11 0.087974586 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

12 0.084591642 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

13 0.080836378 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

14 0.078021981 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

15 0.077791519 242 nips-2012-Non-linear Metric Learning

16 0.072710901 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

17 0.065002471 361 nips-2012-Volume Regularization for Binary Classification

18 0.063309647 197 nips-2012-Learning with Recursive Perceptual Representations

19 0.062584355 324 nips-2012-Stochastic Gradient Descent with Only One Projection

20 0.061824802 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.023), (2, 0.08), (3, -0.092), (4, 0.091), (5, 0.047), (6, -0.016), (7, 0.091), (8, -0.052), (9, -0.008), (10, 0.122), (11, 0.056), (12, 0.056), (13, 0.068), (14, 0.061), (15, -0.02), (16, -0.091), (17, -0.032), (18, 0.013), (19, 0.121), (20, -0.047), (21, 0.064), (22, -0.093), (23, 0.056), (24, -0.128), (25, 0.082), (26, -0.131), (27, 0.005), (28, 0.07), (29, 0.002), (30, 0.066), (31, 0.057), (32, 0.051), (33, 0.059), (34, 0.06), (35, 0.031), (36, 0.069), (37, 0.06), (38, -0.136), (39, -0.004), (40, -0.094), (41, 0.176), (42, 0.094), (43, -0.124), (44, -0.09), (45, -0.012), (46, 0.114), (47, -0.032), (48, 0.04), (49, -0.163)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92849195 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

2 0.734788 227 nips-2012-Multiclass Learning with Simplex Coding

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the first provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

3 0.57871604 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

4 0.55948859 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

Author: Harish G. Ramaswamy, Shivani Agarwal

Abstract: We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting. We then introduce the notion of classification calibration dimension of a multiclass loss matrix, which measures the smallest ‘size’ of a prediction space for which it is possible to design a convex surrogate that is classification calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, as one application, we provide a different route from the recent result of Duchi et al. (2010) for analyzing the difficulty of designing ‘low-dimensional’ convex surrogates that are consistent with respect to pairwise subset ranking losses. We anticipate the classification calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems. 1

5 0.54274046 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Author: Aharon Birnbaum, Shai S. Shwartz

Abstract: Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L∗ + ϵ, where L∗ is the γ γ optimal γ-margin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hinge-loss. For α = 0, Shalev-Shwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in ˜ time exp(O(1/γ)). An immediate question, which this paper tackles, is what is achievable if α ∈ (0, 1/γ). We derive positive results interpolating between the polynomial time for α = 1/γ and the exponential time for α = 0. In particular, we show that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time. Our results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. 1

6 0.52231503 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

7 0.50756955 361 nips-2012-Volume Regularization for Binary Classification

8 0.50312388 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

9 0.48857045 280 nips-2012-Proper losses for learning from partial labels

10 0.4745023 217 nips-2012-Mixability in Statistical Learning

11 0.4346801 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

12 0.43008137 16 nips-2012-A Polynomial-time Form of Robust Regression

13 0.42370623 204 nips-2012-MAP Inference in Chains using Column Generation

14 0.41814628 200 nips-2012-Local Supervised Learning through Space Partitioning

15 0.41443181 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

16 0.41373873 161 nips-2012-Interpreting prediction markets: a stochastic approach

17 0.40540174 267 nips-2012-Perceptron Learning of SAT

18 0.40023834 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

19 0.38559806 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

20 0.38455921 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (21, 0.018), (29, 0.193), (38, 0.186), (39, 0.018), (42, 0.08), (53, 0.01), (54, 0.024), (55, 0.019), (68, 0.012), (74, 0.037), (76, 0.12), (80, 0.099), (92, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88728547 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

2 0.86078542 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

Author: Yanping Huang, Timothy Hanks, Mike Shadlen, Abram L. Friesen, Rajesh P. Rao

Abstract: How does the brain combine prior knowledge with sensory evidence when making decisions under uncertainty? Two competing descriptive models have been proposed based on experimental data. The first posits an additive offset to a decision variable, implying a static effect of the prior. However, this model is inconsistent with recent data from a motion discrimination task involving temporal integration of uncertain sensory evidence. To explain this data, a second model has been proposed which assumes a time-varying influence of the prior. Here we present a normative model of decision making that incorporates prior knowledge in a principled way. We show that the additive offset model and the time-varying prior model emerge naturally when decision making is viewed within the framework of partially observable Markov decision processes (POMDPs). Decision making in the model reduces to (1) computing beliefs given observations and prior information in a Bayesian manner, and (2) selecting actions based on these beliefs to maximize the expected sum of future rewards. We show that the model can explain both data previously explained using the additive offset model as well as more recent data on the time-varying influence of prior knowledge on decision making. 1

same-paper 3 0.85147083 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

4 0.81708735 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

Author: Claudio Gentile, Francesco Orabona

Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T ) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1

5 0.79270059 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

6 0.78853315 227 nips-2012-Multiclass Learning with Simplex Coding

7 0.78832573 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

8 0.78408062 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method

9 0.78379816 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

10 0.78235519 358 nips-2012-Value Pursuit Iteration

11 0.78164333 292 nips-2012-Regularized Off-Policy TD-Learning

12 0.78103983 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

13 0.7802456 324 nips-2012-Stochastic Gradient Descent with Only One Projection

14 0.77889067 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

15 0.77872884 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

16 0.77649355 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

17 0.7764414 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

18 0.77604926 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

19 0.77583587 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

20 0.77563763 275 nips-2012-Privacy Aware Learning