jmlr jmlr2012 jmlr2012-62 knowledge-graph by maker-knowledge-mining

62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package

Source: pdf

Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl

Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classiﬁers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar ﬁlters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a ﬂexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classiﬁer

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Journal of Machine Learning Research 13 (2012) 549-553 Submitted 8/11; Published 3/12 M ULTI B OOST: A Multi-purpose Boosting Package Djalel Benbouzid R´ bert Busa-Fekete∗ o DJALEL . [sent-1, score-0.051]

2 13-19 Bevenden Street London, N1 6AA, United Kingdom Francois-David Collin ¸ Bal´ zs K´ gl† a e FRADAV @ GMAIL . [sent-6, score-0.04]

3 COM Linear Accelerator Laboratory University of Paris-Sud, CNRS Orsay 91898, France Editor: S¨ ren Sonnenburg o Abstract The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. [sent-9, score-0.217]

4 MH but it also implements popular cascade classiﬁers and F ILTER B OOST. [sent-11, score-0.186]

5 The package contains common multi-class base learners (stumps, trees, products, Haar ﬁlters). [sent-12, score-0.377]

6 Further base learners and strong learners following the boosting paradigm can be easily implemented in a ﬂexible framework. [sent-13, score-0.661]

7 MH has become the gold standard of multi-class boosting due to its simplicity and versatility. [sent-20, score-0.153]

8 Whereas binary A DA B OOST with decision stumps is easy to code, multi-class A DA B OOST. [sent-22, score-0.09]

9 MH and complex base learners are not straightforward to implement efﬁciently. [sent-23, score-0.344]

10 The M ULTI B OOST software package is intended to ﬁll this gap. [sent-24, score-0.064]

11 Its main boosting engine is based on the A DA B OOST. [sent-25, score-0.153]

12 c 2012 Djalel Benbouzid, R´ bert Busa-Fekete, Norman Casagrande, Francois-David Collin and Bal´ zs K´ gl. [sent-31, score-0.091]

13 The package includes common multi-class base learners (real and nominal valued decision stumps, trees, products (K´ gl and Busa-Fekete, 2009), and Haar ﬁlters), but the ﬂexible e architecture makes it simple to add new base learners without interfering with the main boosting engine. [sent-37, score-1.143]

14 M ULTI B OOST was designed in the object-oriented paradigm and coded in C++, so it is fast and it provides a ﬂexible base for implementing further modules. [sent-38, score-0.198]

15 Section 2 describes the general architecture and the modules of M ULTI B OOST. [sent-40, score-0.084]

16 The Architecture M ULTI B OOST was implemented within the object-oriented paradigm using some design patterns. [sent-43, score-0.04]

17 It consists of several modules which can be changed or extended more or less independently (Figure 1). [sent-44, score-0.03]

18 For instance, an advanced user can implement a data-type/base-learner pair without any need to modify the other modules. [sent-45, score-0.031]

19 1 Strong Learners The strong learner1 calls the base learners iteratively, stores the learned base classiﬁers and their coefﬁcients, and manages the weights of the training instances. [sent-47, score-0.527]

20 The name originally comes from the boosting (PAC learning) literature. [sent-49, score-0.153]

21 Here, we use it in a broader sense to mean the “outer” loop of the boosting iteration. [sent-50, score-0.153]

22 550 M ULTI B OOST in a human-readable XML format that allows one to resume a run after it was stopped or crashed. [sent-51, score-0.056]

23 M ULTI B OOST implements the following strong learners: • A DA B OOST. [sent-52, score-0.054]

24 MH (Schapire and Singer, 1999): a multi-class/multi-label/multi-task version of A DA B OOST that learns a “ﬂat” linear combination of vector-valued base classiﬁers. [sent-53, score-0.196]

25 • VJC ASCADE (Viola and Jones, 2004): an algorithm that learns a cascade classiﬁer tree by running A DA B OOST at each node. [sent-55, score-0.216]

26 • S OFT C ASCADE (Bourdev and Brandt, 2005): another cascade learner that starts with a set of base classiﬁers, reorders them, and augments them with rejection thresholds. [sent-56, score-0.451]

27 2 Base Learners M ULTI B OOST implements the following base learners. [sent-58, score-0.212]

28 • The S TUMP learner is a one-decision two-leaf tree learned on real-valued features. [sent-59, score-0.211]

29 It is indexed by the feature it cuts and the threshold where it cuts the feature. [sent-60, score-0.102]

30 • S ELECTOR is a one-decision two-leaf tree learned on nominal features. [sent-61, score-0.11]

31 It is indexed by the feature it cuts and the value of the feature it selects. [sent-62, score-0.051]

32 • H AAR S TUMP is a S TUMP learned over a feature space generated using rectangular ﬁlters on images. [sent-64, score-0.03]

33 • T REE is a meta base learner that takes any base learner as input and learns a vector-valued multi-class decision tree using the input base learner as the basic cut. [sent-65, score-1.033]

34 • P RODUCT is another meta base learner that takes any base learner as input and learns a vectorvalued multi-class decision product (K´ gl and Busa-Fekete, 2009) using the input base learner e as terms of the product. [sent-66, score-1.198]

35 3 The Data Representation The multi-class data structure is a set of observation-label pairs, where each observation is a vector of feature values, and each label is a vector of binary class indicators. [sent-68, score-0.028]

36 In binary classiﬁcation, we also allow one single label that indicates the class dichotomy. [sent-69, score-0.028]

37 In addition, multi-task classiﬁcation can be encoded by letting each label column represent a different task. [sent-71, score-0.028]

38 We implement a sparse data representation for both the observation matrix and the label matrix. [sent-72, score-0.059]

39 In general, base learners were implemented to work with their own data representation. [sent-73, score-0.313]

40 4 The Data Parser and the Output Information The training and test sets can be input in the attribute-relation ﬁle format (ARFF),2 in the SVML IGHT format,3 or using a comma separated text ﬁle. [sent-76, score-0.03]

41 We augmented the ﬁrst two formats with initial label weighting, which is an important feature in the boosting framework (especially in the multi-class/multi-label setup). [sent-77, score-0.181]

42 In each iteration, M ULTI B OOST can output several metrics (speciﬁed by the user), such as the 01 error, the Hamming loss, or the area under the ROC curve. [sent-78, score-0.025]

43 New metrics can also be implemented without modifying other parts of the code. [sent-79, score-0.025]

44 5 L AZY B OOST and BANDIT B OOST When the number of features is large, featurewise learners (S TUMP, S ELECTOR, and I NDICATOR) can be accelerated by searching only a subset of the features in each iteration. [sent-81, score-0.155]

45 M ULTI B OOST implements two options, namely, L AZY B OOST (Escudero et al. [sent-82, score-0.054]

46 , 2000), where features are sampled randomly, and BANDIT B OOST (Busa-Fekete and K´ gl, 2010), where the sampling is biased towards e “good” features learned using a multi-armed bandit algorithm. [sent-83, score-0.086]

47 4 It is available under the GPL licence at multiboost. [sent-86, score-0.051]

48 The website also provides documentation that contains detailed instructions and examples for using the package along with tutorials explaining how to implement new features. [sent-88, score-0.227]

49 The documentation also contains the pseudo-code of the multi-class base learners implemented in M ULTI B OOST. [sent-89, score-0.369]

50 Challenges and Benchmarks We make available reproducible test results (validated test errors, learning curves) of M ULTI B OOST on the web site as we produce them. [sent-91, score-0.026]

51 , 2006) was the best genre classiﬁer out of 15 submissions and the second-best out of 10 submissions at recognizing artists in the MIREX 2005 international contest on music information extraction. [sent-94, score-0.226]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('oost', 0.649), ('ulti', 0.289), ('da', 0.226), ('gl', 0.185), ('tump', 0.181), ('base', 0.158), ('learners', 0.155), ('boosting', 0.153), ('learner', 0.135), ('cascade', 0.132), ('ilter', 0.12), ('schapire', 0.094), ('ascade', 0.09), ('benbouzid', 0.09), ('casagrande', 0.09), ('djalel', 0.09), ('stumps', 0.09), ('elector', 0.077), ('norman', 0.077), ('bradley', 0.077), ('meta', 0.07), ('bourdev', 0.07), ('viola', 0.07), ('package', 0.064), ('gmail', 0.062), ('aar', 0.06), ('accelerator', 0.06), ('asagrande', 0.06), ('azy', 0.06), ('collin', 0.06), ('ekete', 0.06), ('enbouzid', 0.06), ('escudero', 0.06), ('filterboost', 0.06), ('ndicator', 0.06), ('oft', 0.06), ('ollin', 0.06), ('submissions', 0.06), ('vjc', 0.06), ('wavii', 0.06), ('bandit', 0.056), ('documentation', 0.056), ('architecture', 0.054), ('implements', 0.054), ('licence', 0.051), ('brandt', 0.051), ('bal', 0.051), ('bert', 0.051), ('cnrs', 0.051), ('orsay', 0.051), ('cuts', 0.051), ('haar', 0.046), ('arff', 0.046), ('parser', 0.046), ('tree', 0.046), ('classi', 0.045), ('lters', 0.043), ('bergstra', 0.043), ('paradigm', 0.04), ('zs', 0.04), ('com', 0.039), ('learns', 0.038), ('exible', 0.038), ('jones', 0.038), ('music', 0.034), ('nominal', 0.034), ('france', 0.033), ('singer', 0.031), ('implement', 0.031), ('hamming', 0.03), ('website', 0.03), ('format', 0.03), ('modules', 0.03), ('learned', 0.03), ('label', 0.028), ('products', 0.027), ('ers', 0.026), ('manages', 0.026), ('roduct', 0.026), ('augments', 0.026), ('artists', 0.026), ('mirex', 0.026), ('reproducible', 0.026), ('ight', 0.026), ('resume', 0.026), ('vectorvalued', 0.026), ('metrics', 0.025), ('freund', 0.025), ('track', 0.024), ('soft', 0.024), ('fteen', 0.023), ('participated', 0.023), ('instructions', 0.023), ('xml', 0.023), ('nished', 0.023), ('contest', 0.023), ('eck', 0.023), ('ree', 0.023), ('genre', 0.023), ('tutorials', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package

Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl

2 0.38996172 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identiﬁcation of an appropriate distance metric on the input data. It is often beneﬁcial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semideﬁnite. Semideﬁnite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semideﬁnite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semideﬁnite matrices as weak learners within an efﬁcient and scalable boosting-based learning process. The resulting methods are easy to implement, efﬁcient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semideﬁnite matrix with trace and rank being one rather than a classiﬁer or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classiﬁcation accuracy and running time. Keywords: Mahalanobis distance, semideﬁnite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor

3 0.066304088 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations sufﬁce to produce a predictor with empirical risk ε-close to the inﬁmum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy

4 0.052035004 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

Abstract: Classiﬁers are often used to detect miscreant activities. We study how an adversary can systematically query a classiﬁer to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classiﬁers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classiﬁer’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classiﬁers is generally efﬁcient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning

5 0.040599354 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

Author: Ji Liu, Peter Wonka, Jieping Ye

Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ∗ + ε where ε is the noise vector following a Gaussian distribution N(0, σ2 I), how to recover the signal (or parameter vector) β∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively reﬁnes the target signal β∗ . We show that if X obeys a certain condition, then with a large probability ˆ the difference between the solution β estimated by the proposed method and the true solution β∗ measured in terms of the ℓ p norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N)1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β∗ , the risk of the oracle estimator ∆ is independent of m and is much smaller than the ﬁrst term, and N is the number of entries of β∗ larger √ than a certain value in the order of O (σ log m). The proposed √ method improves the estimation √ bound of the standard Dantzig selector approximately from Cs1/p log mσ to C(s − N)1/p log mσ where the value N depends on the number of large entries in β∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case. Keywords: multi-stage, Dantzig selector, LASSO, sparse signal recovery

6 0.038982838 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel

7 0.03807598 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

8 0.036429387 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

9 0.035083525 73 jmlr-2012-Multi-task Regression using Minimal Penalties

10 0.03484026 90 jmlr-2012-Pattern for Python

11 0.032806024 103 jmlr-2012-Sampling Methods for the Nyström Method

12 0.030565564 106 jmlr-2012-Sign Language Recognition using Sub-Units

13 0.030164892 72 jmlr-2012-Multi-Target Regression with Rule Ensembles

14 0.025872242 80 jmlr-2012-On Ranking and Generalization Bounds

15 0.023305729 20 jmlr-2012-Analysis of a Random Forests Model

16 0.022696711 82 jmlr-2012-On the Necessity of Irrelevant Variables

17 0.022669708 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

18 0.022402955 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

19 0.021533869 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

20 0.02121095 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.12), (1, -0.001), (2, 0.203), (3, 0.14), (4, -0.507), (5, 0.064), (6, -0.05), (7, 0.107), (8, -0.034), (9, 0.14), (10, 0.002), (11, 0.091), (12, 0.121), (13, 0.033), (14, -0.13), (15, -0.016), (16, -0.018), (17, -0.072), (18, -0.22), (19, 0.064), (20, 0.027), (21, -0.137), (22, 0.022), (23, 0.222), (24, -0.133), (25, 0.079), (26, 0.059), (27, 0.209), (28, 0.109), (29, 0.125), (30, -0.032), (31, -0.078), (32, 0.06), (33, -0.018), (34, 0.058), (35, -0.038), (36, 0.07), (37, 0.098), (38, 0.046), (39, 0.007), (40, -0.092), (41, 0.049), (42, 0.034), (43, 0.115), (44, -0.033), (45, 0.025), (46, -0.038), (47, 0.006), (48, -0.102), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9795562 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package

Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl

2 0.68728727 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

3 0.22214361 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

4 0.17592633 103 jmlr-2012-Sampling Methods for the Nyström Method

Author: Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar

Abstract: The Nystr¨ m method is an efﬁcient technique to generate low-rank matrix approximations and is o used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efﬁcacy of a variety of ﬁxed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystr¨ m method. We report results of extensive experiments o that provide a detailed comparison of various ﬁxed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystr¨ m method when used in o conjunction with either ﬁxed or adaptive sampling schemes. Corroborating these empirical ﬁndings, we present a theoretical analysis of the Nystr¨ m method, providing novel error bounds guaro anteeing a better convergence rate of the ensemble Nystr¨ m method in comparison to the standard o Nystr¨ m method. o Keywords: low-rank approximation, nystr¨ m method, ensemble methods, large-scale learning o

5 0.16959696 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel

Author: Stephen R. Piccolo, Lewis J. Frey

Abstract: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classiﬁcation analyses in a systematic yet ﬂexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net. Keywords: toolbox, classiﬁcation, parallel, ensemble, reproducible research

6 0.1515646 90 jmlr-2012-Pattern for Python

7 0.1402861 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

8 0.13146672 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

9 0.11987242 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

10 0.11670054 106 jmlr-2012-Sign Language Recognition using Sub-Units

11 0.097713314 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

12 0.096975103 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

13 0.095522128 13 jmlr-2012-Active Learning via Perfect Selective Classification

14 0.09266942 72 jmlr-2012-Multi-Target Regression with Rule Ensembles

15 0.091077983 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

16 0.091077819 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

17 0.087613061 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

18 0.085023887 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

19 0.078159459 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features

20 0.077287585 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.025), (21, 0.031), (26, 0.018), (27, 0.025), (29, 0.023), (49, 0.014), (56, 0.04), (57, 0.017), (59, 0.537), (69, 0.035), (75, 0.047), (92, 0.023), (96, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82957006 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package

Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl

2 0.549573 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

Author: Carl Brunner, Andreas Fischer, Klaus Luig, Thorsten Thies

Abstract: Pairwise classiﬁcation is the task to predict whether the examples a, b of a pair (a, b) belong to the same class or to different classes. In particular, interclass generalization problems can be treated in this way. In pairwise classiﬁcation, the order of the two input examples should not affect the classiﬁcation result. To achieve this, particular kernels as well as the use of symmetric training sets in the framework of support vector machines were suggested. The paper discusses both approaches in a general way and establishes a strong connection between them. In addition, an efﬁcient implementation is discussed which allows the training of several millions of pairs. The value of these contributions is conﬁrmed by excellent results on the labeled faces in the wild benchmark. Keywords: pairwise support vector machines, interclass generalization, pairwise kernels, large scale problems

3 0.1928733 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

4 0.18241052 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

5 0.16635664 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

Author: Sivan Sabato, Naftali Tishby

Abstract: In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classiﬁcation rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to speciﬁc settings or applications. In this work we provide a uniﬁed theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a speciﬁc application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efﬁcient PAC-learning for MIL can be generated from any efﬁcient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size. Keywords: multiple-instance learning, learning theory, sample complexity, PAC learning, supervised classiﬁcation

6 0.15517662 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

7 0.15414108 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

8 0.1491015 106 jmlr-2012-Sign Language Recognition using Sub-Units

9 0.14909033 44 jmlr-2012-Feature Selection via Dependence Maximization

10 0.14772031 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit

11 0.14762211 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

12 0.14725798 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

13 0.14663936 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

14 0.14663318 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

15 0.14595413 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

16 0.14434734 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

17 0.14428683 97 jmlr-2012-Regularization Techniques for Learning with Matrices

18 0.14414948 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

19 0.14414234 100 jmlr-2012-Robust Kernel Density Estimation

20 0.1440179 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage