jmlr jmlr2008 jmlr2008-91 knowledge-graph by maker-knowledge-mining

91 jmlr-2008-Trust Region Newton Method for Logistic Regression

Source: pdf

Author: Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi

Abstract: Large-scale logistic regression arises in many applications such as document classiﬁcation and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also extend the proposed method to large-scale L2-loss linear support vector machines (SVM). Keywords: logistic regression, newton method, trust region, conjugate gradient, support vector machines

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Research Santa Clara, CA 95054, USA Editor: Alexander Smola Abstract Large-scale logistic regression arises in many applications such as document classiﬁcation and natural language processing. [sent-10, score-0.179]

2 In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. [sent-11, score-0.721]

3 Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. [sent-13, score-0.218]

4 Keywords: logistic regression, newton method, trust region, conjugate gradient, support vector machines 1. [sent-15, score-1.059]

5 Introduction The logistic regression model is useful for two-class classiﬁcation. [sent-16, score-0.153]

6 There are many methods for training logistic regression models. [sent-30, score-0.196]

7 , 2003), nonlinear conjugate gradient, quasi Newton (in particular, limited memory BFGS) (Liu and Nocedal, 1989; Benson and Mor´ , 2001), and truncated Newton (Komarek and Moore, e 2005). [sent-34, score-0.432]

8 Truncated Newton methods have been an effective approach for large-scale unconstrained optimization, but their use for logistic regression has not been fully exploited. [sent-53, score-0.182]

9 In Section 2, we discuss an efﬁcient and robust truncated Newton method for logistic regression. [sent-55, score-0.247]

10 This approach, called trust region Newton method, uses only approximate Newton steps in the beginning, but takes full Newton directions in the end for fast convergence. [sent-56, score-0.547]

11 In Sections 3 and 4, we discuss some existing optimization methods for logistic regression and conduct comparisons. [sent-57, score-0.181]

12 Section 5 investigates a variant of our proposed method by using preconditioned conjugate gradients in the trust region framework. [sent-60, score-0.778]

13 In Section 6, we extend the proposed trust region method to solve L2-loss support vector machines. [sent-61, score-0.568]

14 For large-scale logistic regression, we then propose a trust region Newton method, which is a type of truncated Newton approach. [sent-72, score-0.773]

15 1 Newton and Truncated Newton Methods To discuss Newton methods, we need the gradient and Hessian of f (w): l ∇ f (w) = w +C ∑ (σ(yi wT xi ) − 1)yi xi , i=1 T ∇2 f (w) = I +CX DX, (3) (4) where I is the identity matrix, Tx σ(yi wT xi ) = (1 + e−yi w i )−1 . [sent-74, score-0.179]

16 Since ∇2 f (wk ) is invertible, the simplest Newton method updates w by the following way wk+1 = wk + sk , (5) where k is the iteration index and sk , the Newton direction, is the solution of the following linear system: ∇2 f (wk )sk = −∇ f (wk ). [sent-83, score-0.649]

17 Two techniques are often used: line search and trust region. [sent-92, score-0.396]

18 Therefore, for large logistic regression, iterative methods are more suitable than direct methods, which require the whole Hessian matrix. [sent-100, score-0.155]

19 Among iterative methods, currently conjugate gradients are the most used ones in Newton methods. [sent-101, score-0.199]

20 The optimization procedure then has two layers of iterations: at each outer iteration an inner conjugate gradient procedure ﬁnds the Newton direction. [sent-102, score-0.352]

21 Unfortunately, conjugate gradient methods may suffer from lengthy iterations in certain situations. [sent-103, score-0.352]

22 Komarek and Moore (2005) are among the ﬁrst to apply truncated Newton methods for logistic regression. [sent-106, score-0.226]

23 1 They approximately solve (6) by conjugate gradient procedures and use (5) to update wk . [sent-107, score-0.751]

24 They terminate the conjugate gradient procedure if the relative difference of log likelihoods between two consecutive conjugate gradient iterations is smaller than a threshold. [sent-108, score-0.603]

25 Some comparisons between limited memory quasi Newton and truncated Newton are by Nocedal and Nash (1991) and Zou et al. [sent-116, score-0.263]

26 2 A Trust Region Newton Method We consider the trust region method (Lin and Mor´ , 1999), which is a truncated Newton method to e deal with general bound-constrained optimization problems (i. [sent-119, score-0.718]

27 At each iteration of a trust region Newton method for minimizing f (w), we have an iterate wk , a size ∆k of the trust region, and a quadratic model 1 qk (s) = ∇ f (wk )T s + sT ∇2 f (wk )s 2 1. [sent-124, score-1.517]

28 630 T RUST R EGION N EWTON M ETHOD FOR L OGISTIC R EGRESSION Algorithm 1 A trust region algorithm for logistic regression 1. [sent-126, score-0.7]

29 • Find an approximate solution sk of the trust region sub-problem min qk (s) s subject to s ≤ ∆k . [sent-133, score-0.709]

30 Next, we ﬁnd a step sk to approximately minimize qk (s) subject to the constraint s ≤ ∆k . [sent-138, score-0.162]

31 We then update wk and ∆k by checking the ratio ρk = f (wk + sk ) − f (wk ) qk (sk ) (8) of the actual reduction in the function to the predicted reduction in the quadratic model. [sent-139, score-0.634]

32 The direction is accepted if ρk is large enough: wk+1 = wk + sk wk if ρk > η0 , if ρk ≤ η0 , (9) where η0 > 0 is a pre-speciﬁed value. [sent-140, score-0.975]

33 The trust region bound ∆k is updated by the rules ∆k+1 ∈ [σ1 min{ sk , ∆k }, σ2 ∆k ] if ρk ≤ η1 , ∆k+1 ∈ [σ1 ∆k , σ3 ∆k ] if ρk ∈ (η1 , η2 ), ∆k+1 ∈ [∆k , σ3 ∆k ] if ρk ≥ η2 . [sent-142, score-0.626]

34 (10) Similar rules are used in most modern trust region methods. [sent-143, score-0.547]

35 A description of our trust region algorithm is given in Algorithm 1. [sent-144, score-0.547]

36 The conjugate gradient method to approximately solve the trust region sub-problem (11) is given in Algorithm 2. [sent-147, score-0.847]

37 Note that only one Hessian-vector product is needed at each conjugate gradient iteration. [sent-150, score-0.279]

38 Since ri = −∇ f (wk ) − ∇2 f (wk )¯i , s the stopping condition (12) is the same as − ∇ f (wk ) − ∇2 f (wk )¯i ≤ ξk ∇ f (wk ) , s 631 L IN , W ENG AND K EERTHI Algorithm 2 Conjugate gradient procedure for approximately solving the trust region sub-problem (11) ¯ 1. [sent-151, score-0.77]

39 (inner iterations) • If ri ≤ ξk ∇ f (wk ) , (12) ¯ then output sk = si and stop. [sent-158, score-0.188]

40 ¯ • If si+1 ≥ ∆k , compute τ such that ¯ si + τdi = ∆k , (13) ¯ then output sk = si + τdi and stop. [sent-161, score-0.169]

41 However, Algorithm 2 is different from standard conjugate gradient methods for linear systems as the constraint s ≤ ∆ ¯ must be taken care of. [sent-166, score-0.279]

42 1) with s0 = 0, we have ¯ ¯ si < si+1 , ∀i, ¯ so in a ﬁnite number of conjugate gradient iterations, either (12) is satisﬁed or si+1 violates the trust region constraint. [sent-168, score-0.871]

43 In the latter situation, (13) ﬁnds a point on the trust region boundary as qk (¯i + τdi ) < qk (¯i ). [sent-169, score-0.713]

44 s s The whole procedure is a careful design to make sure that the approximate Newton direction is good enough and the trust region method converges. [sent-170, score-0.568]

45 Next, we discuss convergence properties of the trust region Newton method. [sent-171, score-0.574]

46 If ξk < 1, then the trust region method Q-linearly converges: lim k→∞ wk+1 − w∗ < 1, wk − w∗ (14) where w∗ is the unique optimal solution of (2). [sent-186, score-1.016]

47 wk − w∗ 2 Regarding the computational complexity, the cost per iteration is O(nnz) for 1 function and 0/1 gradient evaluations + O(nnz) × number of conjugate gradient iterations, (15) where nnz is the number of nonzero elements in the sparse matrix X. [sent-192, score-1.01]

48 Note that if wk is not updated in (9), then the gradient is the same for the next iteration. [sent-193, score-0.558]

49 Related Methods and Implementation Issues In this section, we discuss a general limited memory quasi Newton implementation (Liu and Nocedal, 1989). [sent-195, score-0.162]

50 Many consider it to be very efﬁcient for training logistic regression. [sent-196, score-0.168]

51 We also discuss implementation issues of the proposed trust region Newton method. [sent-197, score-0.547]

52 This property is useful for large logistic regression as we cannot afford to store Hk . [sent-204, score-0.153]

53 In contrast, limited memory quasi Newton methods consider only approximate Hessian matrices, so we can expect that it has slower convergence. [sent-234, score-0.162]

54 Moreover, compared to (15), the cost per iteration is less than that for our trust region method. [sent-237, score-0.592]

55 We set the initial ∆0 = ∇ f (w0 ) and take η0 = 10−4 in (9) to update wk . [sent-245, score-0.472]

56 In the conjugate gradient e procedure, we use ξk = 0. [sent-254, score-0.279]

57 These choices are considered appropriate following the research on trust region methods in the past several decades. [sent-257, score-0.547]

58 It is unclear yet if they are the best for logistic regression problems, but certainly we would like to try custom settings ﬁrst. [sent-258, score-0.153]

59 Experiments In this section, we compare our approach with a quasi Newton implementation for logistic regression. [sent-260, score-0.218]

60 2 Comparisons We compare two logistic regression implementations: • TRON: the trust region Newton method discussed in Section 2. [sent-370, score-0.721]

61 • LBFGS: the limited memory quasi Newton implementation (Liu and Nocedal, 1989). [sent-372, score-0.162]

62 For the ﬁrst one, we simulate the practical use of logistic regression by setting a stopping condition and checking the prediction ability. [sent-395, score-0.202]

63 Most unconstrained optimization software use gradient information to terminate the iterative procedure, so we use ∇ f (wk ) ∞ ≤ 10−3 (18) as the stopping condition. [sent-396, score-0.267]

64 On training time, TRON is better than LBFGS, so truncated Newton methods are effective for training logistic regression. [sent-406, score-0.312]

65 For example, if X= −10 0 −20 0 , 30 0 0 10 then its compressed column format is by three arrays: X val = [−10, 30, −20, 10], X rowind = [1, 2, 1, 2], X colptr = [1, 3, 3, 4, 5], where rowind means row indices and colptr means column pointers. [sent-449, score-0.275]

66 The main conjugate gradient operation (7) involves two matrix-vector products—one is with X T , and the other is with X. [sent-454, score-0.279]

67 If the number of nonzeros in a column is signiﬁcantly larger than those in a row, very likely a column cannot be ﬁt into the same memory level as that for a row. [sent-462, score-0.152]

68 Preconditioned Conjugate Gradient Methods To reduce the number of conjugate gradient iterations, in the truncated Newton method one often uses preconditioned conjugate gradient procedures. [sent-475, score-0.721]

69 If the approximate factorization (19) is good enough, P−1 ∇2 f (wk )P−T is close to the identity and less conjugate gradient iterations are needed. [sent-477, score-0.324]

70 However, as we need extra efforts to ﬁnd P and the cost per conjugate iteration is higher, a smaller number of conjugate gradient iterations may not lead to shorter training time. [sent-478, score-0.581]

71 First, lengthy conjugate gradient iterations often occur at ﬁnal outer steps, but for machine learning applications the algorithm may stop before reaching such a stage. [sent-482, score-0.375]

72 The cost of obtaining the preconditioner is thus no more than that of one conjugate gradient iteration. [sent-488, score-0.31]

73 642 (20) T RUST R EGION N EWTON M ETHOD FOR L OGISTIC R EGRESSION Algorithm 4 Preconditioned conjugate gradient procedure for approximately solving the trust region sub-problem (21) ˆ 1. [sent-491, score-0.826]

74 (inner iterations) • If ˆ ri ≤ ξk g , ˆ then output sk = P−T si and stop. [sent-498, score-0.188]

75 ˆ With s = PT s, we transform (20) to ˆ s min qk (ˆ) ˆ s where ˆ subject to s ≤ ∆k , (21) 1 ˆ ˆ s ˆ ˆ qk (ˆ) = gT s + sT Hˆ, ˆ s 2 and ˆ g = P−1 ∇ f (wk ), ˆ H = P−1 ∇2 f (wk )P−T . [sent-505, score-0.166]

76 Note that in practical implementations we calculate Hdi by a way similar to (7) P−1 (P−T di +C(X T (D(X(P−T di ))))). [sent-509, score-0.16]

77 In Table 4, we present the average number of conjugate gradient iterations per fold in the CV procedure. [sent-510, score-0.347]

78 643 L IN , W ENG AND K EERTHI Problem a9a real-sim news20 citeseer yahoo-japan rcv1 yahoo-korea CG 567 104 71 113 278 225 779 PCG 263 160 155 115 326 280 736 Table 4: Average number of conjugate gradient iterations per fold in the CV procedure. [sent-515, score-0.347]

79 Trust Region Method for L2-SVM The second term in (2) can be considered as a loss function, so regularized logistic regression is related to other learning approaches such as Support Vector Machines (SVM) (Boser et al. [sent-521, score-0.153]

80 In this section, we extend our trust region method for L2-SVM. [sent-535, score-0.568]

81 The gradient ∇ f2 (w) is Lipschitz continuous, so one can deﬁne the generalized Hessian matrix B(w) = I + 2CX T DX, where  1 if 1 − yi wT xi > 0,  Dii = any element in [0, 1] if 1 − yi wT xi = 0,   0 if 1 − yi wT xi < 0. [sent-543, score-0.303]

82 Then the trust region method (Lin and Mor´ , 1999) can be applied by replacing ∇2 f (w) in Section e 2. [sent-544, score-0.568]

83 The Hessian-vector product in the conjugate gradient procedure is then T B(w)s = s + 2C · XI,: (DI,I (XI,: s)). [sent-553, score-0.279]

84 Keerthi and DeCoste (2005) use conjugate gradient methods to solve (26), and the procedure is described in Algorithm 5. [sent-563, score-0.279]

85 In contrast, the convergence of our trust region method holds when the conjugate gradient procedure only approximately minimizes the trust-region sub-problem. [sent-567, score-0.874]

86 ¯ Solve (26) by the conjugate gradient procedure and obtain wk . [sent-576, score-0.727]

87 Find αk = arg min f (wk + αsk ), α≥0 and set wk+1 = wk + αk sk . [sent-578, score-0.527]

88 3 Comparisons We compare our proposed trust region implementation (TRON) in Section 6. [sent-580, score-0.547]

89 To solve (26), SVMlin considers a relative stopping condition for the conjugate gradient procedure. [sent-586, score-0.328]

90 Following their convergence result, we modify SVMlin to quite accurately solve the linear system (26): Recall in Algorithm 5 that we sequentially obtain the following items: ¯ wk → Ik → wk . [sent-587, score-0.923]

91 We then use ¯ (I + 2CXIT,: XIk ,: )wk − 2CXIT,: yIk k k ∞ ≤ 10−3 as the stopping condition of the conjugate gradient procedure in SVMlin. [sent-588, score-0.328]

92 Both approaches spend most of their time on the operation (24) in the conjugate gradient procedure. [sent-590, score-0.305]

93 In contrast, trust region methods are effective on using only approximate directions in the early stage of the procedure. [sent-594, score-0.547]

94 Discussion and Conclusions As logistic regression is a special case of maximum entropy models and conditional random ﬁelds, it is possible to extend the proposed approach for them. [sent-596, score-0.153]

95 In summary, we have shown that a trust region Newton method is effective for training largescale logistic regression problems as well as L2-SVM. [sent-612, score-0.764]

96 It is interesting that we do not need many special settings for logistic regression; a rather direct use of modern trust region techniques already yields excellent performances. [sent-614, score-0.672]

97 If this property is wrong, there is a sequence {wk } in the set (29) satisfying wk → ∞. [sent-625, score-0.448]

98 However, 1 f (wk ) ≥ wk 2 → ∞ 2 contradicts the fact that f (wk ) ≤ f (w0 ), ∀k. [sent-626, score-0.448]

99 Making logistic regression a core data mining tool: A practical investigation of accuracy, speed, and simplicity. [sent-699, score-0.153]

100 The conjugate gradient method and trust regions in large scale optimization. [sent-771, score-0.696]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('wk', 0.448), ('trust', 0.396), ('newton', 0.369), ('mor', 0.206), ('tron', 0.185), ('conjugate', 0.169), ('lbfgs', 0.154), ('region', 0.151), ('egion', 0.134), ('ogistic', 0.134), ('cv', 0.132), ('logistic', 0.125), ('eerthi', 0.124), ('ewton', 0.113), ('rust', 0.113), ('gradient', 0.11), ('truncated', 0.101), ('eng', 0.094), ('wt', 0.094), ('ethod', 0.093), ('quasi', 0.093), ('egression', 0.086), ('hessian', 0.085), ('qk', 0.083), ('nocedal', 0.083), ('di', 0.08), ('sk', 0.079), ('preconditioners', 0.073), ('hk', 0.072), ('ri', 0.064), ('lin', 0.063), ('nnz', 0.061), ('svmlin', 0.061), ('komarek', 0.051), ('yahoo', 0.051), ('stopping', 0.049), ('argonne', 0.049), ('steihaug', 0.049), ('bfgs', 0.047), ('decoste', 0.047), ('keerthi', 0.046), ('iterations', 0.045), ('si', 0.045), ('liu', 0.044), ('training', 0.043), ('memory', 0.043), ('format', 0.043), ('jorge', 0.042), ('formats', 0.041), ('preconditioned', 0.041), ('nonzeros', 0.041), ('row', 0.04), ('bouaricha', 0.036), ('hdi', 0.036), ('column', 0.034), ('stephen', 0.034), ('yi', 0.034), ('sathiya', 0.031), ('differentiable', 0.031), ('preconditioner', 0.031), ('iterative', 0.03), ('unconstrained', 0.029), ('regression', 0.028), ('optimization', 0.028), ('compressed', 0.028), ('lengthy', 0.028), ('mangasarian', 0.028), ('ruby', 0.028), ('pietra', 0.028), ('dii', 0.028), ('convergence', 0.027), ('time', 0.026), ('limited', 0.026), ('document', 0.026), ('weng', 0.025), ('pt', 0.025), ('continuously', 0.025), ('benson', 0.024), ('colptr', 0.024), ('malouf', 0.024), ('normalizations', 0.024), ('pcg', 0.024), ('rong', 0.024), ('rowind', 0.024), ('update', 0.024), ('nonzero', 0.023), ('outer', 0.023), ('xi', 0.023), ('per', 0.023), ('dx', 0.022), ('iteration', 0.022), ('taiwan', 0.022), ('matrix', 0.022), ('sparse', 0.022), ('software', 0.021), ('instances', 0.021), ('moore', 0.021), ('method', 0.021), ('twice', 0.021), ('asuncion', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 91 jmlr-2008-Trust Region Newton Method for Logistic Regression

Author: Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi

2 0.44274959 28 jmlr-2008-Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Author: Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin

Abstract: Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classiﬁcation and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2-loss function. At each step, the proposed method minimizes a one-variable sub-problem while ﬁxing other variables. The sub-problem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. As each sub-problem involves only values of a corresponding feature, the proposed approach is suitable when accessing a feature is more convenient than accessing an instance. Experiments show that our method is more efﬁcient and stable than state of the art methods such as Pegasos and TRON. Keywords: linear support vector machines, document classiﬁcation, coordinate descent

3 0.15626225 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

Author: Matthias W. Seeger

Abstract: We propose a highly efﬁcient framework for penalized likelihood kernel methods applied to multiclass models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the ﬁtting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classiﬁcation tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007). Keywords: multi-way classiﬁcation, kernel logistic regression, hierarchical classiﬁcation, cross validation optimization, Newton-Raphson optimization

4 0.078567103 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, Sathiya S. Keerthi

Abstract: Due to its wide applicability, the problem of semi-supervised classiﬁcation is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3 VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3 VM algorithms is studied together, under a common experimental setting. Keywords: semi-supervised learning, support vector machines, non-convex optimization, transductive learning

5 0.076479286 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

Author: Stefan Klanke, Sethu Vijayakumar, Stefan Schaal

Abstract: In this paper we introduce an improved implementation of locally weighted projection regression (LWPR), a supervised learning algorithm that is capable of handling high-dimensional input data. As the key features, our code supports multi-threading, is available for multiple platforms, and provides wrappers for several programming languages. Keywords: regression, local learning, online learning, C, C++, Matlab, Octave, Python

6 0.071262084 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

7 0.063177243 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting

8 0.055624332 79 jmlr-2008-Ranking Categorical Features Using Generalization Properties

9 0.049620785 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

10 0.042886458 58 jmlr-2008-Max-margin Classification of Data with Absent Features

11 0.041004416 52 jmlr-2008-Learning from Multiple Sources

12 0.037598375 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace

13 0.035268597 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

14 0.034373932 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

15 0.034198642 86 jmlr-2008-SimpleMKL

16 0.034170199 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

17 0.033850849 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

18 0.03021778 96 jmlr-2008-Visualizing Data using t-SNE

19 0.029578194 80 jmlr-2008-Ranking Individuals by Group Comparisons

20 0.029110109 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.202), (1, -0.108), (2, 0.09), (3, 0.719), (4, 0.002), (5, -0.057), (6, -0.1), (7, -0.174), (8, 0.136), (9, -0.008), (10, 0.096), (11, 0.035), (12, -0.009), (13, -0.002), (14, -0.006), (15, 0.053), (16, 0.004), (17, -0.028), (18, -0.055), (19, -0.04), (20, 0.05), (21, -0.027), (22, 0.037), (23, -0.013), (24, 0.034), (25, -0.015), (26, 0.037), (27, -0.041), (28, 0.007), (29, -0.024), (30, -0.065), (31, -0.039), (32, 0.03), (33, -0.015), (34, 0.003), (35, -0.024), (36, 0.034), (37, 0.007), (38, 0.057), (39, 0.053), (40, -0.004), (41, 0.026), (42, -0.027), (43, 0.052), (44, -0.036), (45, -0.021), (46, 0.098), (47, -0.031), (48, -0.084), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96918297 91 jmlr-2008-Trust Region Newton Method for Logistic Regression

Author: Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi

2 0.94962102 28 jmlr-2008-Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Author: Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin

3 0.38716203 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

Author: Matthias W. Seeger

4 0.2429688 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, Sathiya S. Keerthi

5 0.22063243 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

Author: Stefan Klanke, Sethu Vijayakumar, Stefan Schaal

6 0.21768695 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting

7 0.19988556 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

8 0.18907945 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

9 0.17786573 79 jmlr-2008-Ranking Categorical Features Using Generalization Properties

10 0.15376261 86 jmlr-2008-SimpleMKL

11 0.15337054 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

12 0.14320731 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

13 0.14263691 96 jmlr-2008-Visualizing Data using t-SNE

14 0.13139725 58 jmlr-2008-Max-margin Classification of Data with Absent Features

15 0.13060397 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

16 0.12878579 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace

17 0.12062351 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

18 0.1185094 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

19 0.11286525 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

20 0.11090921 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.015), (5, 0.015), (31, 0.018), (40, 0.023), (54, 0.036), (58, 0.038), (66, 0.037), (76, 0.016), (88, 0.039), (92, 0.021), (94, 0.625), (99, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99126619 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

Author: Stefan Klanke, Sethu Vijayakumar, Stefan Schaal

same-paper 2 0.97212887 91 jmlr-2008-Trust Region Newton Method for Logistic Regression

Author: Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi

3 0.95116192 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Author: Suhrid Balakrishnan, David Madigan

Abstract: Classiﬁers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classiﬁers, etc., provide competitive methods for classiﬁcation problems in high dimensions. However, current algorithms for training sparse classiﬁers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multipass algorithms for training sparse linear classiﬁers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function. Keywords: Laplace approximation, expectation propagation, LASSO

4 0.83593619 28 jmlr-2008-Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Author: Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin

5 0.5664978 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

Author: Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin

Abstract: LIBLINEAR is an open source library for large-scale linear classiﬁcation. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efﬁcient on large sparse data sets. Keywords: large-scale linear classiﬁcation, logistic regression, support vector machines, open source, machine learning

6 0.55793184 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

7 0.55520445 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

8 0.55426919 16 jmlr-2008-Approximations for Binary Gaussian Process Classification

9 0.54699439 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

10 0.53770059 86 jmlr-2008-SimpleMKL

11 0.49934024 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

12 0.48989332 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

13 0.48966777 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

14 0.48015493 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

15 0.47804642 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

16 0.46110252 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)

17 0.45586991 83 jmlr-2008-Robust Submodular Observation Selection

18 0.44632116 85 jmlr-2008-Shark (Machine Learning Open Source Software Paper)

19 0.44569519 9 jmlr-2008-Active Learning by Spherical Subdivision

20 0.43675679 56 jmlr-2008-Magic Moments for Structured Output Prediction