jmlr jmlr2009 jmlr2009-39 knowledge-graph by maker-knowledge-mining

39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training


Source: pdf

Author: Kristian Woodsend, Jacek Gondzio

Abstract: Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI. Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. [sent-8, score-0.394]

2 Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. [sent-9, score-0.763]

3 We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. [sent-11, score-0.288]

4 We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI. [sent-12, score-0.1]

5 Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method 1. [sent-13, score-0.215]

6 Like many machine learning techniques, SVMs involve a training stage, where the machine learns a pattern in the data from a training data set, and a separate test or validation stage where the ability of the machine to correctly predict labels is evaluated using a previously unseen test data set. [sent-17, score-0.136]

7 The training stage for Support Vector Machines involves at its core a dense convex quadratic optimization problem (QP). [sent-19, score-0.14]

8 Parallelization schemes so far proposed have involved splitting the training data to give smaller, separable optimization sub-problems which can be distributed amongst the processors. [sent-34, score-0.164]

9 Another approach is to use a variation of the standard SVM algorithm that is better suited to a parallel architecture. [sent-41, score-0.288]

10 Tveit and Engum (2003) developed an exact parallel implementation of the Proximal SVM (Fung and Mangasarian, 2001), which classifies points by assigning them to the closest of two parallel planes. [sent-42, score-0.614]

11 There have only been a few parallel methods in the literature which train a standard SVM on the whole of the data set. [sent-44, score-0.288]

12 Zanghirati and Zanni implement the inner solver using a technique called variable projection method, which is able to work efficiently on relatively large dense inner problems, and is suitable for implementing in parallel. [sent-50, score-0.117]

13 The support vectors given by the SVMs of one layer are combined to form the training sets of the next layer. [sent-55, score-0.149]

14 The support vectors of the final layer are re-inserted into the training sets of the first layer at the next iteration, until the global KKT conditions are met. [sent-56, score-0.23]

15 (2007), implemented in the Milde software, is a parallel implementation of the sequential minimal optimization. [sent-59, score-0.326]

16 When a variable zi enters the working set, the owner processor broadcasts the corresponding data vector xi . [sent-64, score-0.199]

17 The authors use a hybrid approach to parallelization similar to ours described below, involving a multi-core BLAS library, but its use is limited to Layer 1 and 2 operations. [sent-67, score-0.233]

18 Another family of approaches to QP optimization are based on interior point method (IPM) technology, which works by delaying the split between active and inactive variables for as long as possible. [sent-68, score-0.115]

19 A straight-forward implementation of the standard SVM dual formulation has a per iteration complexity of O (n3 ), and would be unusable for anything but the smallest problems. [sent-71, score-0.121]

20 (2008) use parallel IPM technology for the optimizer, and avoid the problem of inverting the dense Hessian matrix by generating a low-rank approximation of the kernel matrix using partial Cholesky decomposition with pivoting. [sent-74, score-0.452]

21 The dense Hessian matrix can then be efficiently inverted implicitly using the low-rank approximation and the Sherman-Morrison-Woodbury (SMW) formula. [sent-75, score-0.118]

22 Moreover, a large part of the calculations at each iteration can be distributed amongst the processors effectively. [sent-76, score-0.247]

23 The SMW formula has been widely used in interior point methods; however, sometimes it runs into numerical difficulties. [sent-77, score-0.115]

24 2007 is the exception) have considered the parallel computer system as a cluster of independent processors, communicating through a message passing scheme such as MPI (MPI-Forum, 1995). [sent-81, score-0.288]

25 Advances in technology have resulted in systems where several processing cores have access to a single memory space, and such symmetric multiprocessing (SMP) architectures are becoming prevalent. [sent-82, score-0.218]

26 It can also be used to communicate between processors within an SMP node, but it is not immediately clear that this is the most efficient technique. [sent-84, score-0.146]

27 A standard approach to combining the two schemes involves OpenMP parallelization inside each MPI process, while communication between the MPI processes is made only outside of the OpenMP regions. [sent-87, score-0.133]

28 In this paper, we propose a parallel linear SVM algorithm that adopts this hybrid approach to parallelization. [sent-89, score-0.388]

29 It trains the SVM using the full data set, using an interior point method to give efficient optimization, and Cholesky decomposition to give good numerical stability. [sent-90, score-0.115]

30 By exploiting the structure of the problem, we show how this can be parallelized with excellent parallel efficiency. [sent-94, score-0.288]

31 The resulting implementation is significantly faster at SVM training than active set methods, and it allows SVMs to be trained on data sets that would be impossible to fit within the memory of a single processor. [sent-95, score-0.153]

32 Section 2 gives an outline of interior point method for optimizing quadratic programs. [sent-97, score-0.115]

33 Then in Section 4 we describe our approach to training linear SVMs, exploiting the structure of the QP and accessing memory efficiently. [sent-99, score-0.154]

34 At each iteration, an interior point method makes a damped Newton step towards satisfying the primal feasibility, dual feasibility and complementarity product conditions, ZSe = µe (U − Z)Ve = µe, for a given µ > 0. [sent-117, score-0.157]

35 We follow a common practice in interior point literature and denote with a capital letter (Z, S,U,V ) a diagonal matrix with elements of the corresponding 1940 H YBRID PARALLEL SVM T RAINING vector (z, s, u, v) on the diagonal. [sent-119, score-0.205]

36 If the block (Q+Θ−1 ) is diagonal, an efficient method to solve such a system is to form the Schur complement C = A (Q + Θ−1 )−1 AT , solve the smaller system C∆λ = rb + A (Q + Θ−1 )−1 rc for ∆λ, and back-substitute into (1) to calculate ∆z. [sent-122, score-0.16]

37 Unfortunately, as we shall see in the next section, for the case of SVM training the Hessian matrix Q is a completely dense matrix. [sent-123, score-0.186]

38 A Support vector machine (SVM) is a classification learning machine that learns a mapping between the features and the target label of a set of data points known as the training set, and then uses a hyperplane wT x + w0 = 0 to separate the data set and predict the class of further data points. [sent-126, score-0.106]

39 Concentrating on the linear SVM classifier, and using a 2-norm for the hyperplane weights w and a 1-norm for the misclassification errors ξ ∈ Rn , the QP that forms the core of training the SVM takes the form: 1 T min w w + τeT ξ 2 w,w0 ,ξ (2) s. [sent-130, score-0.106]

40 The operation of building the matrix C is of order O (n(m+1)2 ), while inverting the resulting matrix is an operation of order O ((m + 1)3 ). [sent-146, score-0.092]

41 The formulation (4) is the basis of our parallel algorithm, where building matrix C is split between the processors. [sent-147, score-0.334]

42 Implementing the QP for Parallel Computation To apply (4) to truly large-scale data sets, it is necessary to employ linear algebra operations that exploit the block structure of the formulation (Gondzio and Sarkissian, 2003; Gondzio and Grothey, 2007). [sent-150, score-0.138]

43 The approach described below was implemented using the OOPS interior point solver (Gondzio and Grothey, 2007). [sent-153, score-0.16]

44 1 We should note here that, as the parallel track of the Challenge was focused on shared memory algorithms, our submission to the Challenge used only the techniques described in Section 4. [sent-154, score-0.335]

45 This results in H 0 having a symmetric bordered block diagonal structure. [sent-165, score-0.098]

46 Terms that form the Schur complement can be calculated in parallel but must then be gathered, and the corresponding blocks LC and DC computed serially. [sent-193, score-0.336]

47 ∑ (6) (7) (8) (9) Matrix C is a dense matrix of relatively small size (m + 1) × (m + 1), and the Cholesky decomT position C = LC DC LC is performed in the normal way on a single processor. [sent-196, score-0.118]

48 It is possible that a coarse-grained parallel implementation of Cholesky decomposition could give better performance (Luecke et al. [sent-197, score-0.326]

49 i (14) For the formation of LDLT , Equations (6) and (7) can be calculated on each processor individually. [sent-202, score-0.199]

50 Outer products (8) are then calculated, and the results gathered onto a single master processor 1 to form C; this requires each processor to transfer approximately 2 (m + 1)2 elements. [sent-203, score-0.433]

51 The master processor performs the Cholesky decomposition of C (9). [sent-204, score-0.234]

52 Each processor needs to calculate LAi rci , which again can be performed without any inter-processor communication, and the results are gathered onto the master processor. [sent-205, score-0.278]

53 The master processor then performs the calculations in Equations (10), (11) and (12) of the back-substitution. [sent-206, score-0.234]

54 Vector ∆λ is broadcast to all processors for them to calculate Equations (13) and (14) locally. [sent-207, score-0.146]

55 2 Linear Algebra Operations Within Nodes Within each node, the bulk of operations are due to the contribution of each processor Ai Hi−1 AT to i the calculation of the Schur complement in (8), and to a lesser extent the calculation of LAi in (7). [sent-209, score-0.239]

56 The standard technology for dense linear algebra operations is the BLAS library. [sent-210, score-0.156]

57 Matrices are partitioned into panels (block rows or block columns) and further partitioned into blocks of a size that fits in the processor’s cache, where access times to the data are much shorter. [sent-214, score-0.155]

58 Divide matrix A into block column panels A p , so that the inner dimensions of A p and B p match. [sent-226, score-0.153]

59 As required, pack block Aip into a contiguous buffer, so that by the end it is transposed and in the L2 cache. [sent-228, score-0.098]

60 Considering each block Aip in turn, perform the multiplication Ci := Aip B p + Ci , with B p brought into the cache in column strips. [sent-230, score-0.096]

61 Similar techniques using panels and blocks can be applied to Cholesky factorization (Buttari et al. [sent-232, score-0.155]

62 Performance In this section we compare the hybrid OpenMP/MPI version of our software with one using only MPI, and also our implementation against three other parallel SVM solvers. [sent-244, score-0.489]

63 To compare the hybrid approach (using the techniques described in Sections 4. [sent-252, score-0.1]

64 They consistently show that, although the pure MPI approach has better properties in terms of parallel efficiency, the hybrid approach is always computationally more efficient. [sent-257, score-0.388]

65 We believe this is a result of the multi-core processor architecture. [sent-258, score-0.199]

66 The cores are associated with relatively small local cache memories, and such an architecture demands a fine-grained parallelism where, to reduce bus traffic, an operation is split into tasks that operate on small portions of data (Buttari et al. [sent-259, score-0.253]

67 We made a comparison with other parallel software PGPDT (Zanni et al. [sent-262, score-0.351]

68 Using a linear kernel in each case, the results are shown 1945 W OODSEND AND G ONDZIO (a) (b) 100 10 100 10 4 8 16 Number of processor cores 32 (c) 4 32 MPI only, τ=0. [sent-267, score-0.37]

69 01 MPI/OpenMP, τ=1 Perfect parallelization Training time (s) 1000 100 10 100 10 4 8 16 Number of processor cores 32 (e) 4 8 16 Number of processor cores 32 (f) MPI only, τ=0. [sent-269, score-0.873]

70 01 MPI/OpenMP, τ=1 Perfect parallelization Training time (s) MPI only, τ=0. [sent-271, score-0.133]

71 01 MPI/OpenMP, τ=1 Perfect parallelization Training time (s) 8 16 Number of processor cores (d) MPI only, τ=0. [sent-273, score-0.503]

72 01 MPI/OpenMP, τ=1 Perfect parallelization Training time (s) MPI only, τ=0. [sent-275, score-0.133]

73 01 MPI/OpenMP, τ=1 Perfect parallelization 1000 Training time (s) Training time (s) MPI only, τ=0. [sent-277, score-0.133]

74 The results show that, although the pure MPI approach shows better parallel efficiency properties, the hybrid approach is always computationally more efficient. [sent-281, score-0.388]

75 Data Set # cores Alpha 16 Beta 16 Gamma 16 Delta 16 Epsilon 32 Zeta 32 FD 32 OCR 32 DNA 48 τ 1 0. [sent-283, score-0.171]

76 In all cases except the DNA data set, the Milde software ran but did not terminate within 24 hours of runtime, so the numbers in brackets show when it was within 1% of its final objective value; — indicates that the software failed to load the problem. [sent-294, score-0.126]

77 For the parallel software OOPS, each core had access to 2GB memory. [sent-312, score-0.351]

78 The single processor codes had access to 4 cores and 8GB memory, and the larger data sets were reduced in size to fit. [sent-313, score-0.37]

79 To make the training equivalent, the rank of the factorization was set to be the number of features m. [sent-343, score-0.122]

80 The results show that our approach described in this paper and implemented in OOPS is typically one to two orders of magnitude faster than the other parallel SVM solvers, terminates reliably, and training times are reasonably consistent for different values of τ. [sent-347, score-0.356]

81 Both codes ran serially, using the memory of 4 processor cores (8GB RAM in total), although it should be noted that as we used the GotoBLAS library when building LibLinear, it was able to use all four processor cores during BLAS operations. [sent-352, score-0.828]

82 Conclusions In this paper, we have shown how to develop a hybrid parallel implementation of linear Support Vector Machine training. [sent-363, score-0.426]

83 Using interior point method to solve the optimization problem in a predictable time, and Cholesky decomposition to give good numerical stability of implicit inverses. [sent-367, score-0.115]

84 Exploiting the block structure of the augmented system matrix, to partition the data and linear algebra computations amongst parallel processing nodes efficiently. [sent-369, score-0.482]

85 Our results show that, for all cases, the hybrid implementation was faster than one using purely MPI, even though the MPI version had better parallel 1949 W OODSEND AND G ONDZIO efficiency. [sent-373, score-0.426]

86 We used the hybrid implementation to solve very large problems from the PASCAL Challenge on Large-scale Learning, of up to a few million data samples. [sent-374, score-0.138]

87 It is possible to extend the techniques described above to handle non-linear kernels, by approximating the positive semidefinite kernel matrix K with a low-rank outer product representation such as partial Cholesky factorization LLT ≈ K (Fine and Scheinberg, 2002). [sent-377, score-0.1]

88 This approach produces the first r columns of the matrix L (corresponding to the r largest pivots) and leaves the other columns as zero, giving an approximation of the matrix K of rank r. [sent-378, score-0.164]

89 Extending the work of Fine and Scheinberg, the diagonal D ∈ Rn×n of the residual matrix (K − LLT ) can be determined at no extra expense and included in a separable formulation for non-linear kernels: 1 T min (w w + zT Dz) − eT z w,z 2 s. [sent-379, score-0.126]

90 (2008) describe how to perform partial Cholesky decomposition in a parallel environment. [sent-383, score-0.288]

91 With this information, all processors can update the section of the new column of L for which they are responsible, and also update corresponding diagonal elements. [sent-388, score-0.19]

92 Although the algorithm requires the processors to be synchronised at each iteration, little of the data needs to be shared amongst the processors: the bulk of the communication between processors is limited to a vector of length m and a vector of at most length r. [sent-389, score-0.352]

93 A class of parallel tiled linear algebra algorithms for multicore architectures. [sent-414, score-0.332]

94 A parallel mixture of SVMs for very large scale problems. [sent-426, score-0.288]

95 A fast parallel optimization for training support vector machine. [sent-429, score-0.356]

96 Parallel interior point solver for structured quadratic programs: Application to financial planning problems. [sent-455, score-0.16]

97 Fast training of support vector machines using sequential minimal optimization. [sent-510, score-0.105]

98 Comparison of parallel programming models on clusters of SMP nodes. [sent-519, score-0.323]

99 A parallel solver for large quadratic programs in training support vector machines. [sent-556, score-0.401]

100 Parallel software for training large scale support vector machines on multiprocessor systems. [sent-559, score-0.168]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mpi', 0.369), ('parallel', 0.288), ('processor', 0.199), ('gondzio', 0.173), ('cores', 0.171), ('oops', 0.156), ('openmp', 0.156), ('processors', 0.146), ('blas', 0.145), ('lai', 0.138), ('ondzio', 0.138), ('oodsend', 0.138), ('ybrid', 0.138), ('svm', 0.137), ('parallelization', 0.133), ('cholesky', 0.123), ('durdanovic', 0.121), ('milde', 0.121), ('pascal', 0.116), ('interior', 0.115), ('qp', 0.108), ('smp', 0.104), ('woodsend', 0.103), ('zanni', 0.103), ('liblinear', 0.101), ('hybrid', 0.1), ('lc', 0.095), ('jacek', 0.086), ('raining', 0.084), ('layer', 0.081), ('challenge', 0.079), ('dense', 0.072), ('svms', 0.071), ('hessian', 0.069), ('aip', 0.069), ('gemm', 0.069), ('zanghirati', 0.069), ('training', 0.068), ('alpha', 0.066), ('rb', 0.066), ('epsilon', 0.066), ('fd', 0.066), ('larank', 0.066), ('software', 0.063), ('zeta', 0.06), ('amongst', 0.06), ('factorization', 0.054), ('block', 0.054), ('fine', 0.053), ('edinburgh', 0.053), ('schur', 0.053), ('panels', 0.053), ('chang', 0.052), ('delta', 0.052), ('buttari', 0.052), ('gotoblas', 0.052), ('graf', 0.052), ('ldlt', 0.052), ('pgpdt', 0.052), ('rabenseifner', 0.052), ('dna', 0.05), ('ocr', 0.048), ('xy', 0.048), ('blocks', 0.048), ('memory', 0.047), ('matrix', 0.046), ('solver', 0.045), ('ipm', 0.044), ('contiguous', 0.044), ('goto', 0.044), ('rci', 0.044), ('diagonal', 0.044), ('algebra', 0.044), ('scheinberg', 0.042), ('cache', 0.042), ('dual', 0.042), ('hi', 0.041), ('library', 0.041), ('iteration', 0.041), ('operations', 0.04), ('rc', 0.04), ('beta', 0.04), ('dc', 0.04), ('architecture', 0.04), ('accessing', 0.039), ('psvm', 0.039), ('hyperplane', 0.038), ('gamma', 0.038), ('implementation', 0.038), ('perfect', 0.037), ('machines', 0.037), ('columns', 0.036), ('proximal', 0.036), ('augmented', 0.036), ('separable', 0.036), ('clusters', 0.035), ('yt', 0.035), ('master', 0.035), ('buffer', 0.035), ('cosatto', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

Author: Kristian Woodsend, Jacek Gondzio

Abstract: Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI. Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method

2 0.11238275 25 jmlr-2009-Distributed Algorithms for Topic Models

Author: David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling

Abstract: We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting. In this algorithm processors concurrently perform Gibbs sampling over local data followed by a global update of topic counts. The algorithm is simple to implement and can be viewed as an approximation to Gibbs-sampled LDA. The second version is a model that uses a hierarchical Bayesian extension of LDA to directly account for distributed data. This model has a theoretical guarantee of convergence but is more complex to implement than the first algorithm. Our distributed algorithm for HDP takes the straightforward mapping approach, and merges newly-created topics either by matching or by topic-id. Using five real-world text corpora we show that distributed learning works well in practice. For both LDA and HDP, we show that the converged test-data log probability for distributed learning is indistinguishable from that obtained with single-processor learning. Our extensive experimental results include learning topic models for two multi-million document collections using a 1024-processor parallel computer. Keywords: topic models, latent Dirichlet allocation, hierarchical Dirichlet processes, distributed parallel computation

3 0.1062993 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit    (Machine Learning Open Source Software Paper)

Author: Davis E. King

Abstract: There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy use of these tools, the entire library has been developed with contract programming, which provides complete and precise documentation as well as powerful debugging tools. Keywords: kernel-methods, svm, rvm, kernel clustering, C++, Bayesian networks

4 0.080183536 88 jmlr-2009-Stable and Efficient Gaussian Process Calculations

Author: Leslie Foster, Alex Waagen, Nabeela Aijaz, Michael Hurley, Apolonio Luis, Joel Rinsky, Chandrika Satyavolu, Michael J. Way, Paul Gazis, Ashok Srivastava

Abstract: The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data. Keywords: Gaussian processes, low rank approximations, numerical stability, photometric redshift, subset of regressors method

5 0.077168152 83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

Author: Antoine Bordes, Léon Bottou, Patrick Gallinari

Abstract: The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of secondorder information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the “Wild Track” of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008). Keywords: support vector machine, stochastic gradient descent

6 0.073552854 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

7 0.062154662 33 jmlr-2009-Exploring Strategies for Training Deep Neural Networks

8 0.060042642 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning

9 0.054585651 29 jmlr-2009-Estimating Labels from Label Proportions

10 0.048994105 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM

11 0.047309894 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

12 0.044163805 99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

13 0.043204296 23 jmlr-2009-Discriminative Learning Under Covariate Shift

14 0.041884754 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

15 0.041522857 14 jmlr-2009-CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

16 0.040745273 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

17 0.038547948 76 jmlr-2009-Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data    (Machine Learning Open Source Software Paper)

18 0.034314111 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

19 0.033895116 43 jmlr-2009-Java-ML: A Machine Learning Library    (Machine Learning Open Source Software Paper)

20 0.033646904 82 jmlr-2009-Robustness and Regularization of Support Vector Machines


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.197), (1, -0.059), (2, 0.09), (3, -0.099), (4, 0.117), (5, -0.071), (6, 0.084), (7, 0.005), (8, 0.108), (9, 0.115), (10, 0.002), (11, 0.189), (12, 0.048), (13, -0.159), (14, -0.228), (15, 0.21), (16, -0.098), (17, 0.091), (18, 0.298), (19, 0.11), (20, -0.094), (21, 0.233), (22, 0.119), (23, -0.047), (24, 0.122), (25, 0.133), (26, -0.139), (27, 0.044), (28, -0.081), (29, 0.013), (30, 0.118), (31, -0.004), (32, 0.006), (33, -0.025), (34, -0.044), (35, 0.014), (36, -0.05), (37, -0.01), (38, -0.026), (39, -0.018), (40, -0.013), (41, -0.002), (42, -0.014), (43, 0.006), (44, -0.04), (45, 0.039), (46, 0.051), (47, 0.044), (48, 0.027), (49, -0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94916499 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

Author: Kristian Woodsend, Jacek Gondzio

Abstract: Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI. Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method

2 0.63581824 25 jmlr-2009-Distributed Algorithms for Topic Models

Author: David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling

Abstract: We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting. In this algorithm processors concurrently perform Gibbs sampling over local data followed by a global update of topic counts. The algorithm is simple to implement and can be viewed as an approximation to Gibbs-sampled LDA. The second version is a model that uses a hierarchical Bayesian extension of LDA to directly account for distributed data. This model has a theoretical guarantee of convergence but is more complex to implement than the first algorithm. Our distributed algorithm for HDP takes the straightforward mapping approach, and merges newly-created topics either by matching or by topic-id. Using five real-world text corpora we show that distributed learning works well in practice. For both LDA and HDP, we show that the converged test-data log probability for distributed learning is indistinguishable from that obtained with single-processor learning. Our extensive experimental results include learning topic models for two multi-million document collections using a 1024-processor parallel computer. Keywords: topic models, latent Dirichlet allocation, hierarchical Dirichlet processes, distributed parallel computation

3 0.45396721 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

Author: Vojtěch Franc, Sören Sonnenburg

Abstract: We have developed an optimized cutting plane algorithm (OCA) for solving large-scale risk minimization problems. We prove that the number of iterations OCA requires to converge to a ε precise solution is approximately linear in the sample size. We also derive OCAS, an OCA-based linear binary Support Vector Machine (SVM) solver, and OCAM, a linear multi-class SVM solver. In an extensive empirical evaluation we show that OCAS outperforms current state-of-the-art SVM solvers like SVMlight , SVMperf and BMRM, achieving speedup factor more than 1,200 over SVMlight on some data sets and speedup factor of 29 over SVMperf , while obtaining the same precise support vector solution. OCAS, even in the early optimization steps, often shows faster convergence than the currently prevailing approximative methods in this domain, SGD and Pegasos. In addition, our proposed linear multi-class SVM solver, OCAM, achieves speedups of factor of up to 10 compared to SVMmulti−class . Finally, we use OCAS and OCAM in two real-world applications, the problem of human acceptor splice site detection and malware detection. Effectively parallelizing OCAS, we achieve state-of-the-art results on an acceptor splice site recognition problem only by being able to learn from all the available 50 million examples in a 12-million-dimensional feature space. Source code, data sets and scripts to reproduce the experiments are available at http://cmp.felk.cvut.cz/˜xfrancv/ocas/html/. Keywords: risk minimization, linear support vector machine, multi-class classification, binary classification, large-scale learning, parallelization

4 0.42241508 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit    (Machine Learning Open Source Software Paper)

Author: Davis E. King

Abstract: There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy use of these tools, the entire library has been developed with contract programming, which provides complete and precise documentation as well as powerful debugging tools. Keywords: kernel-methods, svm, rvm, kernel clustering, C++, Bayesian networks

5 0.41736969 88 jmlr-2009-Stable and Efficient Gaussian Process Calculations

Author: Leslie Foster, Alex Waagen, Nabeela Aijaz, Michael Hurley, Apolonio Luis, Joel Rinsky, Chandrika Satyavolu, Michael J. Way, Paul Gazis, Ashok Srivastava

Abstract: The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data. Keywords: Gaussian processes, low rank approximations, numerical stability, photometric redshift, subset of regressors method

6 0.31814992 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning

7 0.29852712 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM

8 0.29679 33 jmlr-2009-Exploring Strategies for Training Deep Neural Networks

9 0.26118344 76 jmlr-2009-Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data    (Machine Learning Open Source Software Paper)

10 0.25649717 99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

11 0.24323468 29 jmlr-2009-Estimating Labels from Label Proportions

12 0.21200091 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

13 0.21171555 83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

14 0.20884083 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

15 0.20393305 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

16 0.19680093 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

17 0.19349912 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

18 0.18689878 14 jmlr-2009-CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

19 0.18423972 38 jmlr-2009-Hash Kernels for Structured Data

20 0.18303533 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.018), (11, 0.026), (19, 0.028), (21, 0.01), (24, 0.448), (26, 0.015), (38, 0.053), (47, 0.01), (52, 0.036), (55, 0.023), (58, 0.054), (66, 0.092), (68, 0.022), (90, 0.053), (96, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75378752 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

Author: Kristian Woodsend, Jacek Gondzio

Abstract: Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI. Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method

2 0.31367859 83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

Author: Antoine Bordes, Léon Bottou, Patrick Gallinari

Abstract: The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of secondorder information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the “Wild Track” of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008). Keywords: support vector machine, stochastic gradient descent

3 0.28912765 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

Author: Vojtěch Franc, Sören Sonnenburg

Abstract: We have developed an optimized cutting plane algorithm (OCA) for solving large-scale risk minimization problems. We prove that the number of iterations OCA requires to converge to a ε precise solution is approximately linear in the sample size. We also derive OCAS, an OCA-based linear binary Support Vector Machine (SVM) solver, and OCAM, a linear multi-class SVM solver. In an extensive empirical evaluation we show that OCAS outperforms current state-of-the-art SVM solvers like SVMlight , SVMperf and BMRM, achieving speedup factor more than 1,200 over SVMlight on some data sets and speedup factor of 29 over SVMperf , while obtaining the same precise support vector solution. OCAS, even in the early optimization steps, often shows faster convergence than the currently prevailing approximative methods in this domain, SGD and Pegasos. In addition, our proposed linear multi-class SVM solver, OCAM, achieves speedups of factor of up to 10 compared to SVMmulti−class . Finally, we use OCAS and OCAM in two real-world applications, the problem of human acceptor splice site detection and malware detection. Effectively parallelizing OCAS, we achieve state-of-the-art results on an acceptor splice site recognition problem only by being able to learn from all the available 50 million examples in a 12-million-dimensional feature space. Source code, data sets and scripts to reproduce the experiments are available at http://cmp.felk.cvut.cz/˜xfrancv/ocas/html/. Keywords: risk minimization, linear support vector machine, multi-class classification, binary classification, large-scale learning, parallelization

4 0.28451627 76 jmlr-2009-Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data    (Machine Learning Open Source Software Paper)

Author: Abhik Shah, Peter Woolf

Abstract: In this paper, we introduce PEBL, a Python library and application for learning Bayesian network structure from data and prior knowledge that provides features unmatched by alternative software packages: the ability to use interventional data, flexible specification of structural priors, modeling with hidden variables and exploitation of parallel processing. PEBL is released under the MIT open-source license, can be installed from the Python Package Index and is available at http://pebl-project.googlecode.com. Keywords: Bayesian networks, python, open source software

5 0.28179431 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

Author: Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy

Abstract: To quickly achieve good performance, reinforcement-learning algorithms for acting in large continuous-valued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. Our algorithm balances this tradeoff by using a stochastic, switching, parametric dynamics representation. We argue that this model characterizes a number of significant, real-world domains, such as robot navigation across varying terrain. We prove that this representational assumption allows our algorithm to be probably approximately correct with a sample complexity that scales polynomially with all problem-specific quantities including the state-space dimension. We also explicitly incorporate the error introduced by approximate planning in our sample complexity bounds, in contrast to prior Probably Approximately Correct (PAC) Markov Decision Processes (MDP) approaches, which typically assume the estimated MDP can be solved exactly. Our experimental results on constructing plans for driving to work using real car trajectory data, as well as a small robot experiment on navigating varying terrain, demonstrate that our dynamics representation enables us to capture real-world dynamics in a sufficient manner to produce good performance. Keywords: reinforcement learning, provably efficient learning

6 0.27867851 27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting

7 0.27590558 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

8 0.27571735 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm    (Special Topic on Mining and Learning with Graphs and Relations)

9 0.27447796 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

10 0.27428368 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

11 0.27405199 29 jmlr-2009-Estimating Labels from Label Proportions

12 0.27362609 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

13 0.27347329 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

14 0.27328995 18 jmlr-2009-Consistency and Localizability

15 0.27313808 38 jmlr-2009-Hash Kernels for Structured Data

16 0.27176088 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

17 0.27127638 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning

18 0.27103499 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

19 0.27097189 79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis

20 0.2706196 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning