jmlr jmlr2012 jmlr2012-62 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
Reference: text
sentIndex sentText sentNum sentScore
1 Journal of Machine Learning Research 13 (2012) 549-553 Submitted 8/11; Published 3/12 M ULTI B OOST: A Multi-purpose Boosting Package Djalel Benbouzid R´ bert Busa-Fekete∗ o DJALEL . [sent-1, score-0.051]
2 13-19 Bevenden Street London, N1 6AA, United Kingdom Francois-David Collin ¸ Bal´ zs K´ gl† a e FRADAV @ GMAIL . [sent-6, score-0.04]
3 COM Linear Accelerator Laboratory University of Paris-Sud, CNRS Orsay 91898, France Editor: S¨ ren Sonnenburg o Abstract The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. [sent-9, score-0.217]
4 MH but it also implements popular cascade classifiers and F ILTER B OOST. [sent-11, score-0.186]
5 The package contains common multi-class base learners (stumps, trees, products, Haar filters). [sent-12, score-0.377]
6 Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. [sent-13, score-0.661]
7 MH has become the gold standard of multi-class boosting due to its simplicity and versatility. [sent-20, score-0.153]
8 Whereas binary A DA B OOST with decision stumps is easy to code, multi-class A DA B OOST. [sent-22, score-0.09]
9 MH and complex base learners are not straightforward to implement efficiently. [sent-23, score-0.344]
10 The M ULTI B OOST software package is intended to fill this gap. [sent-24, score-0.064]
11 Its main boosting engine is based on the A DA B OOST. [sent-25, score-0.153]
12 c 2012 Djalel Benbouzid, R´ bert Busa-Fekete, Norman Casagrande, Francois-David Collin and Bal´ zs K´ gl. [sent-31, score-0.091]
13 The package includes common multi-class base learners (real and nominal valued decision stumps, trees, products (K´ gl and Busa-Fekete, 2009), and Haar filters), but the flexible e architecture makes it simple to add new base learners without interfering with the main boosting engine. [sent-37, score-1.143]
14 M ULTI B OOST was designed in the object-oriented paradigm and coded in C++, so it is fast and it provides a flexible base for implementing further modules. [sent-38, score-0.198]
15 Section 2 describes the general architecture and the modules of M ULTI B OOST. [sent-40, score-0.084]
16 The Architecture M ULTI B OOST was implemented within the object-oriented paradigm using some design patterns. [sent-43, score-0.04]
17 It consists of several modules which can be changed or extended more or less independently (Figure 1). [sent-44, score-0.03]
18 For instance, an advanced user can implement a data-type/base-learner pair without any need to modify the other modules. [sent-45, score-0.031]
19 1 Strong Learners The strong learner1 calls the base learners iteratively, stores the learned base classifiers and their coefficients, and manages the weights of the training instances. [sent-47, score-0.527]
20 The name originally comes from the boosting (PAC learning) literature. [sent-49, score-0.153]
21 Here, we use it in a broader sense to mean the “outer” loop of the boosting iteration. [sent-50, score-0.153]
22 550 M ULTI B OOST in a human-readable XML format that allows one to resume a run after it was stopped or crashed. [sent-51, score-0.056]
23 M ULTI B OOST implements the following strong learners: • A DA B OOST. [sent-52, score-0.054]
24 MH (Schapire and Singer, 1999): a multi-class/multi-label/multi-task version of A DA B OOST that learns a “flat” linear combination of vector-valued base classifiers. [sent-53, score-0.196]
25 • VJC ASCADE (Viola and Jones, 2004): an algorithm that learns a cascade classifier tree by running A DA B OOST at each node. [sent-55, score-0.216]
26 • S OFT C ASCADE (Bourdev and Brandt, 2005): another cascade learner that starts with a set of base classifiers, reorders them, and augments them with rejection thresholds. [sent-56, score-0.451]
27 2 Base Learners M ULTI B OOST implements the following base learners. [sent-58, score-0.212]
28 • The S TUMP learner is a one-decision two-leaf tree learned on real-valued features. [sent-59, score-0.211]
29 It is indexed by the feature it cuts and the threshold where it cuts the feature. [sent-60, score-0.102]
30 • S ELECTOR is a one-decision two-leaf tree learned on nominal features. [sent-61, score-0.11]
31 It is indexed by the feature it cuts and the value of the feature it selects. [sent-62, score-0.051]
32 • H AAR S TUMP is a S TUMP learned over a feature space generated using rectangular filters on images. [sent-64, score-0.03]
33 • T REE is a meta base learner that takes any base learner as input and learns a vector-valued multi-class decision tree using the input base learner as the basic cut. [sent-65, score-1.033]
34 • P RODUCT is another meta base learner that takes any base learner as input and learns a vectorvalued multi-class decision product (K´ gl and Busa-Fekete, 2009) using the input base learner e as terms of the product. [sent-66, score-1.198]
35 3 The Data Representation The multi-class data structure is a set of observation-label pairs, where each observation is a vector of feature values, and each label is a vector of binary class indicators. [sent-68, score-0.028]
36 In binary classification, we also allow one single label that indicates the class dichotomy. [sent-69, score-0.028]
37 In addition, multi-task classification can be encoded by letting each label column represent a different task. [sent-71, score-0.028]
38 We implement a sparse data representation for both the observation matrix and the label matrix. [sent-72, score-0.059]
39 In general, base learners were implemented to work with their own data representation. [sent-73, score-0.313]
40 4 The Data Parser and the Output Information The training and test sets can be input in the attribute-relation file format (ARFF),2 in the SVML IGHT format,3 or using a comma separated text file. [sent-76, score-0.03]
41 We augmented the first two formats with initial label weighting, which is an important feature in the boosting framework (especially in the multi-class/multi-label setup). [sent-77, score-0.181]
42 In each iteration, M ULTI B OOST can output several metrics (specified by the user), such as the 01 error, the Hamming loss, or the area under the ROC curve. [sent-78, score-0.025]
43 New metrics can also be implemented without modifying other parts of the code. [sent-79, score-0.025]
44 5 L AZY B OOST and BANDIT B OOST When the number of features is large, featurewise learners (S TUMP, S ELECTOR, and I NDICATOR) can be accelerated by searching only a subset of the features in each iteration. [sent-81, score-0.155]
45 M ULTI B OOST implements two options, namely, L AZY B OOST (Escudero et al. [sent-82, score-0.054]
46 , 2000), where features are sampled randomly, and BANDIT B OOST (Busa-Fekete and K´ gl, 2010), where the sampling is biased towards e “good” features learned using a multi-armed bandit algorithm. [sent-83, score-0.086]
47 4 It is available under the GPL licence at multiboost. [sent-86, score-0.051]
48 The website also provides documentation that contains detailed instructions and examples for using the package along with tutorials explaining how to implement new features. [sent-88, score-0.227]
49 The documentation also contains the pseudo-code of the multi-class base learners implemented in M ULTI B OOST. [sent-89, score-0.369]
50 Challenges and Benchmarks We make available reproducible test results (validated test errors, learning curves) of M ULTI B OOST on the web site as we produce them. [sent-91, score-0.026]
51 , 2006) was the best genre classifier out of 15 submissions and the second-best out of 10 submissions at recognizing artists in the MIREX 2005 international contest on music information extraction. [sent-94, score-0.226]
wordName wordTfidf (topN-words)
[('oost', 0.649), ('ulti', 0.289), ('da', 0.226), ('gl', 0.185), ('tump', 0.181), ('base', 0.158), ('learners', 0.155), ('boosting', 0.153), ('learner', 0.135), ('cascade', 0.132), ('ilter', 0.12), ('schapire', 0.094), ('ascade', 0.09), ('benbouzid', 0.09), ('casagrande', 0.09), ('djalel', 0.09), ('stumps', 0.09), ('elector', 0.077), ('norman', 0.077), ('bradley', 0.077), ('meta', 0.07), ('bourdev', 0.07), ('viola', 0.07), ('package', 0.064), ('gmail', 0.062), ('aar', 0.06), ('accelerator', 0.06), ('asagrande', 0.06), ('azy', 0.06), ('collin', 0.06), ('ekete', 0.06), ('enbouzid', 0.06), ('escudero', 0.06), ('filterboost', 0.06), ('ndicator', 0.06), ('oft', 0.06), ('ollin', 0.06), ('submissions', 0.06), ('vjc', 0.06), ('wavii', 0.06), ('bandit', 0.056), ('documentation', 0.056), ('architecture', 0.054), ('implements', 0.054), ('licence', 0.051), ('brandt', 0.051), ('bal', 0.051), ('bert', 0.051), ('cnrs', 0.051), ('orsay', 0.051), ('cuts', 0.051), ('haar', 0.046), ('arff', 0.046), ('parser', 0.046), ('tree', 0.046), ('classi', 0.045), ('lters', 0.043), ('bergstra', 0.043), ('paradigm', 0.04), ('zs', 0.04), ('com', 0.039), ('learns', 0.038), ('exible', 0.038), ('jones', 0.038), ('music', 0.034), ('nominal', 0.034), ('france', 0.033), ('singer', 0.031), ('implement', 0.031), ('hamming', 0.03), ('website', 0.03), ('format', 0.03), ('modules', 0.03), ('learned', 0.03), ('label', 0.028), ('products', 0.027), ('ers', 0.026), ('manages', 0.026), ('roduct', 0.026), ('augments', 0.026), ('artists', 0.026), ('mirex', 0.026), ('reproducible', 0.026), ('ight', 0.026), ('resume', 0.026), ('vectorvalued', 0.026), ('metrics', 0.025), ('freund', 0.025), ('track', 0.024), ('soft', 0.024), ('fteen', 0.023), ('participated', 0.023), ('instructions', 0.023), ('xml', 0.023), ('nished', 0.023), ('contest', 0.023), ('eck', 0.023), ('ree', 0.023), ('genre', 0.023), ('tutorials', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
2 0.38996172 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
3 0.066304088 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
4 0.052035004 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
5 0.040599354 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
Author: Ji Liu, Peter Wonka, Jieping Ye
Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ∗ + ε where ε is the noise vector following a Gaussian distribution N(0, σ2 I), how to recover the signal (or parameter vector) β∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal β∗ . We show that if X obeys a certain condition, then with a large probability ˆ the difference between the solution β estimated by the proposed method and the true solution β∗ measured in terms of the ℓ p norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N)1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β∗ , the risk of the oracle estimator ∆ is independent of m and is much smaller than the first term, and N is the number of entries of β∗ larger √ than a certain value in the order of O (σ log m). The proposed √ method improves the estimation √ bound of the standard Dantzig selector approximately from Cs1/p log mσ to C(s − N)1/p log mσ where the value N depends on the number of large entries in β∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case. Keywords: multi-stage, Dantzig selector, LASSO, sparse signal recovery
6 0.038982838 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
7 0.03807598 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class
8 0.036429387 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
9 0.035083525 73 jmlr-2012-Multi-task Regression using Minimal Penalties
10 0.03484026 90 jmlr-2012-Pattern for Python
11 0.032806024 103 jmlr-2012-Sampling Methods for the Nyström Method
12 0.030565564 106 jmlr-2012-Sign Language Recognition using Sub-Units
13 0.030164892 72 jmlr-2012-Multi-Target Regression with Rule Ensembles
14 0.025872242 80 jmlr-2012-On Ranking and Generalization Bounds
15 0.023305729 20 jmlr-2012-Analysis of a Random Forests Model
16 0.022696711 82 jmlr-2012-On the Necessity of Irrelevant Variables
17 0.022669708 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data
18 0.022402955 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
19 0.021533869 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
20 0.02121095 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
topicId topicWeight
[(0, -0.12), (1, -0.001), (2, 0.203), (3, 0.14), (4, -0.507), (5, 0.064), (6, -0.05), (7, 0.107), (8, -0.034), (9, 0.14), (10, 0.002), (11, 0.091), (12, 0.121), (13, 0.033), (14, -0.13), (15, -0.016), (16, -0.018), (17, -0.072), (18, -0.22), (19, 0.064), (20, 0.027), (21, -0.137), (22, 0.022), (23, 0.222), (24, -0.133), (25, 0.079), (26, 0.059), (27, 0.209), (28, 0.109), (29, 0.125), (30, -0.032), (31, -0.078), (32, 0.06), (33, -0.018), (34, 0.058), (35, -0.038), (36, 0.07), (37, 0.098), (38, 0.046), (39, 0.007), (40, -0.092), (41, 0.049), (42, 0.034), (43, 0.115), (44, -0.033), (45, 0.025), (46, -0.038), (47, 0.006), (48, -0.102), (49, -0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.9795562 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
2 0.68728727 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
3 0.22214361 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
4 0.17592633 103 jmlr-2012-Sampling Methods for the Nyström Method
Author: Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar
Abstract: The Nystr¨ m method is an efficient technique to generate low-rank matrix approximations and is o used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystr¨ m method. We report results of extensive experiments o that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystr¨ m method when used in o conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nystr¨ m method, providing novel error bounds guaro anteeing a better convergence rate of the ensemble Nystr¨ m method in comparison to the standard o Nystr¨ m method. o Keywords: low-rank approximation, nystr¨ m method, ensemble methods, large-scale learning o
5 0.16959696 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
Author: Stephen R. Piccolo, Lewis J. Frey
Abstract: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet flexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net. Keywords: toolbox, classification, parallel, ensemble, reproducible research
6 0.1515646 90 jmlr-2012-Pattern for Python
7 0.1402861 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class
8 0.13146672 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
9 0.11987242 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data
10 0.11670054 106 jmlr-2012-Sign Language Recognition using Sub-Units
11 0.097713314 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
12 0.096975103 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
13 0.095522128 13 jmlr-2012-Active Learning via Perfect Selective Classification
14 0.09266942 72 jmlr-2012-Multi-Target Regression with Rule Ensembles
15 0.091077983 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
16 0.091077819 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
17 0.087613061 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
18 0.085023887 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
19 0.078159459 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
20 0.077287585 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss
topicId topicWeight
[(7, 0.025), (21, 0.031), (26, 0.018), (27, 0.025), (29, 0.023), (49, 0.014), (56, 0.04), (57, 0.017), (59, 0.537), (69, 0.035), (75, 0.047), (92, 0.023), (96, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.82957006 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
2 0.549573 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
Author: Carl Brunner, Andreas Fischer, Klaus Luig, Thorsten Thies
Abstract: Pairwise classification is the task to predict whether the examples a, b of a pair (a, b) belong to the same class or to different classes. In particular, interclass generalization problems can be treated in this way. In pairwise classification, the order of the two input examples should not affect the classification result. To achieve this, particular kernels as well as the use of symmetric training sets in the framework of support vector machines were suggested. The paper discusses both approaches in a general way and establishes a strong connection between them. In addition, an efficient implementation is discussed which allows the training of several millions of pairs. The value of these contributions is confirmed by excellent results on the labeled faces in the wild benchmark. Keywords: pairwise support vector machines, interclass generalization, pairwise kernels, large scale problems
3 0.1928733 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
4 0.18241052 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
5 0.16635664 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class
Author: Sivan Sabato, Naftali Tishby
Abstract: In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classification rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to specific settings or applications. In this work we provide a unified theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a specific application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efficient PAC-learning for MIL can be generated from any efficient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size. Keywords: multiple-instance learning, learning theory, sample complexity, PAC learning, supervised classification
6 0.15517662 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
7 0.15414108 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
8 0.1491015 106 jmlr-2012-Sign Language Recognition using Sub-Units
9 0.14909033 44 jmlr-2012-Feature Selection via Dependence Maximization
10 0.14772031 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
11 0.14762211 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
12 0.14725798 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
13 0.14663936 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
14 0.14663318 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
15 0.14595413 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
16 0.14434734 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
17 0.14428683 97 jmlr-2012-Regularization Techniques for Learning with Matrices
18 0.14414948 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
19 0.14414234 100 jmlr-2012-Robust Kernel Density Estimation
20 0.1440179 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage