jmlr jmlr2008 jmlr2008-25 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gérard Biau, Luc Devroye, Gábor Lugosi
Abstract: In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent. Keywords: random forests, classification trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman.
Reference: text
sentIndex sentText sentNum sentScore
1 Journal of Machine Learning Research 9 (2008) 2015-2033 Submitted 1/08; Revised 5/08; Published 9/08 Consistency of Random Forests and Other Averaging Classifiers G´ rard Biau e GERARD . [sent-1, score-0.046]
2 CA School of Computer Science McGill University Montreal, Canada H3A 2K6 G´ bor Lugosi a LUGOSI @ UPF. [sent-5, score-0.037]
3 ES ICREA and Department of Economics Pompeu Fabra University Ramon Trias Fargas 25-27 08005 Barcelona, Spain Editor: Peter Bartlett Abstract In the last years of his life, Leo Breiman promoted random forests for use in classification. [sent-6, score-0.398]
4 He suggested using averaging as a means of obtaining good discrimination rules. [sent-7, score-0.061]
5 The base classifiers used for averaging are simple and randomized, often based on random samples from the data. [sent-8, score-0.146]
6 He left a few questions unanswered regarding the consistency of such rules. [sent-9, score-0.144]
7 In this paper, we give a number of theorems that establish the universal consistency of averaging rules. [sent-10, score-0.182]
8 We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent. [sent-11, score-0.023]
9 Keywords: random forests, classification trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman. [sent-12, score-0.097]
10 Introduction Ensemble methods, popular in machine learning, are learning algorithms that construct a set of many individual classifiers (called base learners) and combine them to classify new data points by taking a weighted or unweighted vote of their predictions. [sent-14, score-0.151]
11 It is now well-known that ensembles are often much more accurate than the individual classifiers that make them up. [sent-15, score-0.035]
12 The success of ensemble algorithms on many benchmark data sets has raised considerable interest in understanding why such methods succeed and identifying circumstances in which they can be expected to produce good results. [sent-16, score-0.07]
13 These methods differ in the way the base learner is fit and combined. [sent-17, score-0.053]
14 For example, bagging (Breiman, 1996) proceeds by generating bootstrap samples from the original data set, constructing a classifier from each bootstrap sample, and voting to combine. [sent-18, score-0.281]
15 In boosting (Freund and Schapire, 1996) and arcing algorithms (Breiman, 1998) the successive classifiers are constructed by giving increased weight to those points that have been frequently misclassified, and the classifiers are combined using weighted voting. [sent-19, score-0.046]
16 On the other hand, random split selection (Dietterich, 2000) c 2008 G´ rard Biau, Luc Devroye and G´ bor Lugosi. [sent-20, score-0.149]
17 e a B IAU , D EVROYE AND L UGOSI grows trees on the original data set. [sent-21, score-0.037]
18 For a fixed number S, at each node, S best splits (in terms of minimizing deviance) are found and the actual split is randomly and uniformly selected from them. [sent-22, score-0.053]
19 For a comprehensive review of ensemble methods, we refer the reader to Dietterich (2000a) and the references therein. [sent-23, score-0.051]
20 Breiman (2001) provides a general framework for tree ensembles called “random forests”. [sent-24, score-0.136]
21 Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees. [sent-25, score-0.133]
22 Thus, a random forest is a classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees. [sent-26, score-0.379]
23 Algorithms for inducing a random forest were first developed by Breiman and Cutler, and “Random Forests” is their trademark. [sent-27, score-0.342]
24 edu/users/breiman/RandomForests provides a collection of downloadable technical reports, and gives an overview of random forests as well as comments on the features of the method. [sent-31, score-0.426]
25 Random forests have been shown to give excellent performance on a number of practical problems. [sent-32, score-0.347]
26 They work fast, generally exhibit a substantial performance improvement over single tree classifiers such as CART, and yield generalization error rates that compare favorably to the best statistical and machine learning methods. [sent-33, score-0.101]
27 In fact, random forests are among the most accurate general-purpose classifiers available (see, for example, Breiman, 2001). [sent-34, score-0.379]
28 Different random forests differ in how randomness is introduced in the tree building process, ranging from extreme random splitting strategies (Breiman, 2000; Cutler and Zhao, 2001) to more involved data-dependent strategies (Amit and Geman, 1997; Breiman, 2001; Dietterich, 2000). [sent-35, score-0.529]
29 As a matter of fact, the statistical mechanism of random forests is not yet fully understood and is still under active investigation. [sent-36, score-0.379]
30 Unlike single trees, where consistency is proved letting the number of ¨ observations in each terminal node become large (Devroye, Gy orfi, and Lugosi, 1996, Chapter 20), random forests are generally built to have a small number of cases in each terminal node. [sent-37, score-0.62]
31 Although the mechanism of random forest algorithms appears simple, it is difficult to analyze and remains largely unknown. [sent-38, score-0.342]
32 Some attempts to investigate the driving force behind consistency of random forests are by Breiman (2000, 2004) and Lin and Jeon (2006), who establish a connection between random forests and adaptive nearest neighbor methods. [sent-39, score-0.922]
33 Meinshausen (2006) proved consistency of certain random forests in the context of so-called quantile regression. [sent-40, score-0.539]
34 In this paper we offer consistency theorems for various versions of random forests and other randomized ensemble classifiers. [sent-41, score-0.693]
35 In Section 2 we introduce a general framework for studying classifiers based on averaging randomized base classifiers. [sent-42, score-0.256]
36 We prove a simple but useful proposition showing that averaged classifiers are consistent whenever the base classifiers are. [sent-43, score-0.181]
37 In Section 3 we prove consistency of two simple random forest classifiers, the purely random forest (suggested by Breiman as a starting point for study) and the scale-invariant random forest classifiers. [sent-44, score-1.192]
38 In Section 4 it is shown that averaging may convert inconsistent rules into consistent ones. [sent-45, score-0.132]
39 In Section 5 we briefly investigate consistency of bagging rules. [sent-46, score-0.206]
40 We show that, in general, bagging preserves consistency of the base rule and it may even create consistent rules from inconsistent ones. [sent-47, score-0.31]
41 In particular, we show that if the bootstrap samples are sufficiently small, the bagged version of the 1-nearest neighbor classifier is consistent. [sent-48, score-0.079]
42 2016 C ONSISTENCY OF R ANDOM F ORESTS Finally, in Section 6 we consider random forest classifiers based on randomized, greedily grown tree classifiers. [sent-49, score-0.443]
43 We argue that some greedy random forest classifiers, including Breiman’s random forest classifier, are inconsistent and suggest a consistent greedy random forest classifier. [sent-50, score-1.097]
44 pairs of random variables such that X (the so-called feature vector) takes its values in Rd while Y (the label) is a binary {0, 1}-valued random variable. [sent-58, score-0.064]
45 A classifier gn is a binary-valued function of X and Dn whose probability of error is defined by L(gn ) = P(X,Y ) {gn (X, Dn ) = Y } where P(X,Y ) denotes probability with respect to the pair (X,Y ) (i. [sent-66, score-0.641]
46 A sequence {gn } of classifiers is consistent for a certain distribution of (X,Y ) if L(gn ) → L∗ in probability. [sent-72, score-0.033]
47 In this paper we investigate classifiers that calculate their decisions by taking a majority vote over randomized classifiers. [sent-73, score-0.328]
48 A randomized classifier may use a random variable Z to calculate its decision. [sent-74, score-0.217]
49 A randomized classifier is an arbitrary function of the form gn (X, Z, Dn ), which we abbreviate by gn (X, Z). [sent-76, score-1.424]
50 The probability of error of gn becomes L(gn ) = P(X,Y ),Z {gn (X, Z, Dn ) = Y } = P{gn (X, Z, Dn ) = Y |Dn } . [sent-77, score-0.641]
51 The definition of consistency remains the same by augmenting the probability space appropriately to include the randomization. [sent-78, score-0.121]
52 Given any randomized classifier, one may calculate the classifier for various draws of the randomizing variable Z. [sent-79, score-0.365]
53 It is then a natural idea to define an averaged classifier by taking a majority vote among the obtained random classifiers. [sent-80, score-0.244]
54 , Zm are identically distributed draws of the randomizing variable, having the same distribution as Z. [sent-84, score-0.206]
55 , Zm are independent, conditionally on X, Y , and Dn . [sent-88, score-0.032]
56 , Zm ), one may define the corresponding voting classifier by (m) gn (x, Z m , Dn ) = 1 1 if m ∑m gn (x, Z j , Dn ) ≥ j=1 0 otherwise. [sent-92, score-1.386]
57 (Here P Z and EZ denote probability and expectation with respect to the randomizing variable Z, that is, conditionally on X, Y , and Dn . [sent-94, score-0.186]
58 ) (m) gn may be interpreted as an idealized version of the classifier g n that draws many independent copies of the randomizing variable Z and takes a majority vote over the resulting classifiers. [sent-95, score-1.004]
59 Our first result states that consistency of a randomized classifier is preserved by averaging. [sent-96, score-0.263]
60 Proposition 1 Assume that the sequence {gn } of randomized classifiers is consistent for a certain (m) distribution of (X,Y ). [sent-97, score-0.175]
61 Then the voting classifier gn (for any value of m) and the averaged classifier gn are also consistent. [sent-98, score-1.455]
62 In fact, since P{gn (X, Z) = Y |X = x} ≥ P{g∗ (X) = Y |X = x} for all x ∈ Rd , consistency of {gn } means that for µ-almost all x, P{gn (X, Z) = Y |X = x} → P{g∗ (X) = Y |X = x} = min(η(x), 1 − η(x)) . [sent-100, score-0.121]
63 ) Then P{g n (X, Z) = Y |X = x} = (2η(x) − 1)P{gn (x, Z) = 0} + 1 − η(x), and by consistency we have P{gn (x, Z) = 0} → 0. [sent-103, score-0.121]
64 (m) (m) To prove consistency of the voting classifier gn , it suffices to show that P{gn (x, Z m ) = 0} → 0 for µ-almost all x for which η(x) > 1/2. [sent-104, score-0.866]
65 Consistency of the averaged classifier is proved by a similar argument. [sent-106, score-0.091]
66 Random Forests Random forests, introduced by Breiman, are averaged classifiers in the sense defined in Section 2. [sent-108, score-0.069]
67 Formally, a random forest with m trees is a classifier consisting of a collection of randomized base tree classifiers gn (x, Z1 ), . [sent-109, score-1.343]
68 , Zm are identically distributed random vectors, independent conditionally on X, Y , and Dn . [sent-115, score-0.09]
69 The randomizing variable is typically used to determine how the successive cuts are performed when building the tree such as selection of the node and the coordinate to split, as well as the position of the split. [sent-116, score-0.278]
70 The random forest classifier takes a majority vote among the random tree classifiers. [sent-117, score-0.618]
71 If m is large, the random forest classifier is well approximated by the averaged classifier 2018 C ONSISTENCY OF R ANDOM F ORESTS gn (x) = {EZ gn (x,Z)≥1/2} . [sent-118, score-1.693]
72 For brevity, we state most results of this paper for the averaged classifier (m) only, though by Proposition 1 various results remain true for the voting classifier g n as well. [sent-119, score-0.173]
73 In this section we analyze a simple random forest already considered by Breiman (2000), which we call the purely random forest. [sent-120, score-0.419]
74 The random tree classifier gn (x, Z) is constructed as follows. [sent-121, score-0.774]
75 All nodes of the tree are associated with rectangular cells such that at each step of the construction of the tree, the collection of cells associated with the leaves of the tree (i. [sent-123, score-0.337]
76 The split variable J is then selected uniformly at random from the d candidates x (1) , . [sent-128, score-0.086]
77 Finally, the selected cell is split along the randomly chosen variable at a random location, chosen according to a uniform random variable on the length of the chosen side of the selected cell. [sent-132, score-0.192]
78 The randomized classifier gn (x, Z) takes a majority vote among all Yi for which the corresponding feature vector Xi falls in the same cell of the random partition as x. [sent-134, score-1.077]
79 ) The purely random forest classifier is a radically simplified version of random forest classifiers used in practice. [sent-136, score-0.729]
80 The main simplification lies in the fact that recursive cell splits do not depend on the labels Y1 , . [sent-137, score-0.073]
81 The next theorem mainly serves as an illustration of how the consistency problem of random forest classifiers may be attacked. [sent-141, score-0.463]
82 More involved versions of random forest classifiers are discussed in subsequent sections. [sent-142, score-0.342]
83 Then the purely random forest classifier gn is consistent whenever k → ∞ and k/n → 0 as k → ∞. [sent-144, score-1.061]
84 Proof By Proposition 1 it suffices to prove consistency of the randomized base tree classifier g n . [sent-145, score-0.417]
85 To this end, we recall a general consistency theorem for partitioning classifiers proved in (Devroye, Gy¨ rfi, and Lugosi, 1996, Theorem 6. [sent-146, score-0.143]
86 Observe that the partition has k + 1 rectangular cells, say A 1 , . [sent-151, score-0.078]
87 Since these points are independent and identically distributed, fixing the set S (but not the order of the points) and Z, the conditional probability that X falls in the i-th cell equals Ni /(n + 1). [sent-166, score-0.109]
wordName wordTfidf (topN-words)
[('gn', 0.641), ('forests', 0.347), ('forest', 0.31), ('dn', 0.23), ('breiman', 0.195), ('randomized', 0.142), ('randomizing', 0.134), ('consistency', 0.121), ('er', 0.116), ('ers', 0.115), ('classi', 0.106), ('voting', 0.104), ('tree', 0.101), ('vote', 0.098), ('lugosi', 0.087), ('devroye', 0.082), ('biau', 0.081), ('zm', 0.077), ('nn', 0.073), ('averaged', 0.069), ('luc', 0.068), ('bagging', 0.065), ('averaging', 0.061), ('bootstrap', 0.056), ('evroye', 0.054), ('iau', 0.054), ('mcgill', 0.054), ('orests', 0.054), ('ugosi', 0.054), ('cell', 0.054), ('base', 0.053), ('ez', 0.052), ('ensemble', 0.051), ('draws', 0.046), ('cutler', 0.046), ('rard', 0.046), ('purely', 0.045), ('majority', 0.045), ('rectangular', 0.042), ('onsistency', 0.041), ('inconsistent', 0.038), ('bor', 0.037), ('terminal', 0.037), ('trees', 0.037), ('partition', 0.036), ('ensembles', 0.035), ('gy', 0.035), ('leo', 0.035), ('split', 0.034), ('cells', 0.033), ('consistent', 0.033), ('andom', 0.033), ('random', 0.032), ('rd', 0.032), ('conditionally', 0.032), ('falling', 0.031), ('paris', 0.031), ('brevity', 0.031), ('dietterich', 0.031), ('ni', 0.03), ('falls', 0.029), ('collection', 0.027), ('identically', 0.026), ('proposition', 0.026), ('letting', 0.024), ('xn', 0.024), ('calculate', 0.023), ('ramon', 0.023), ('pierre', 0.023), ('upmc', 0.023), ('montreal', 0.023), ('arcing', 0.023), ('bagged', 0.023), ('barcelona', 0.023), ('diam', 0.023), ('driving', 0.023), ('limm', 0.023), ('unanswered', 0.023), ('universally', 0.023), ('successive', 0.023), ('proved', 0.022), ('concreteness', 0.02), ('downloadable', 0.02), ('economics', 0.02), ('idealized', 0.02), ('investigate', 0.02), ('variable', 0.02), ('splits', 0.019), ('curie', 0.019), ('marie', 0.019), ('promoted', 0.019), ('cart', 0.019), ('bo', 0.019), ('borel', 0.019), ('deviance', 0.019), ('geman', 0.019), ('succeed', 0.019), ('quantile', 0.017), ('life', 0.017), ('randomness', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers
Author: Gérard Biau, Luc Devroye, Gábor Lugosi
Abstract: In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent. Keywords: random forests, classification trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman.
2 0.15334232 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
Author: Eric Bax
Abstract: This paper develops bounds on out-of-sample error rates for support vector machines (SVMs). The bounds are based on the numbers of support vectors in the SVMs rather than on VC dimension. The bounds developed here improve on support vector counting bounds derived using Littlestone and Warmuth’s compression-based bounding technique. Keywords: compression, error bound, support vector machine, nearly uniform
3 0.06794852 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees
Author: Amit Dhurandhar, Alin Dobra
Abstract: In this paper we use the methodology introduced by Dhurandhar and Dobra (2009) for analyzing the error of classifiers and the model selection measures, to analyze decision tree algorithms. The methodology consists of obtaining parametric expressions for the moments of the generalization error (GE) for the classification model of interest, followed by plotting these expressions for interpretability. The major challenge in applying the methodology to decision trees, the main theme of this work, is customizing the generic expressions for the moments of GE to this particular classification algorithm. The specific contributions we make in this paper are: (a) we primarily characterize a subclass of decision trees namely, Random decision trees, (b) we discuss how the analysis extends to other decision tree algorithms and (c) in order to extend the analysis to certain model selection measures, we generalize the relationships between the moments of GE and moments of the model selection measures given in (Dhurandhar and Dobra, 2009) to randomized classification algorithms. An empirical comparison of the proposed method with Monte Carlo and distribution free bounds obtained using Breiman’s formula, depicts the advantages of the method in terms of running time and accuracy. It thus showcases the use of the deployed methodology as an exploratory tool to study learning algorithms. Keywords: moments, generalization error, decision trees
4 0.057420198 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
Author: Sébastien Loustau
Abstract: This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is two-fold: (1) we propose learning rates for SVM using Sobolev spaces and build a numerically realizable aggregate that converges with same rate; (2) we present practical experiments of this method of aggregation for SVM using both Sobolev spaces and Gaussian kernels. Keywords: classification, support vector machines, learning rates, approximation, aggregation of classifiers
Author: Shann-Ching Chen, Geoffrey J. Gordon, Robert F. Murphy
Abstract: In structured classification problems, there is a direct conflict between expressive models and efficient inference: while graphical models such as Markov random fields or factor graphs can represent arbitrary dependences among instance labels, the cost of inference via belief propagation in these models grows rapidly as the graph structure becomes more complicated. One important source of complexity in belief propagation is the need to marginalize large factors to compute messages. This operation takes time exponential in the number of variables in the factor, and can limit the expressiveness of the models we can use. In this paper, we study a new class of potential functions, which we call decomposable k-way potentials, and provide efficient algorithms for computing messages from these potentials during belief propagation. We believe these new potentials provide a good balance between expressive power and efficient inference in practical structured classification problems. We discuss three instances of decomposable potentials: the associative Markov network potential, the nested junction tree, and a new type of potential which we call the voting potential. We use these potentials to classify images of protein subcellular location patterns in groups of cells. Classifying subcellular location patterns can help us answer many important questions in computational biology, including questions about how various treatments affect the synthesis and behavior of proteins and networks of proteins within a cell. Our new representation and algorithm lead to substantial improvements in both inference speed and classification accuracy. Keywords: factor graphs, approximate inference algorithms, structured classification, protein subcellular location patterns, location proteomics
6 0.055232868 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
7 0.04671799 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting
8 0.042747982 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
9 0.038887717 92 jmlr-2008-Universal Multi-Task Kernels
10 0.038329139 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
11 0.037363492 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
12 0.036348544 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers
13 0.035817966 52 jmlr-2008-Learning from Multiple Sources
14 0.034862831 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers
15 0.032908674 87 jmlr-2008-Stationary Features and Cat Detection
16 0.031330999 9 jmlr-2008-Active Learning by Spherical Subdivision
17 0.028181577 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons
18 0.02801762 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
19 0.027829239 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments
20 0.027420241 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
topicId topicWeight
[(0, 0.14), (1, -0.029), (2, 0.128), (3, -0.113), (4, -0.033), (5, -0.013), (6, 0.083), (7, -0.118), (8, 0.003), (9, -0.192), (10, 0.054), (11, -0.005), (12, -0.087), (13, 0.091), (14, -0.214), (15, 0.022), (16, 0.294), (17, -0.005), (18, -0.176), (19, 0.04), (20, -0.078), (21, -0.073), (22, 0.186), (23, -0.151), (24, 0.12), (25, -0.106), (26, -0.017), (27, 0.031), (28, -0.043), (29, 0.003), (30, 0.036), (31, -0.222), (32, -0.109), (33, 0.076), (34, 0.226), (35, 0.096), (36, 0.001), (37, 0.104), (38, 0.017), (39, -0.044), (40, 0.025), (41, -0.103), (42, -0.06), (43, -0.159), (44, 0.071), (45, -0.167), (46, -0.102), (47, 0.017), (48, 0.031), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.96751708 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers
Author: Gérard Biau, Luc Devroye, Gábor Lugosi
Abstract: In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent. Keywords: random forests, classification trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman.
2 0.63544101 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
Author: Eric Bax
Abstract: This paper develops bounds on out-of-sample error rates for support vector machines (SVMs). The bounds are based on the numbers of support vectors in the SVMs rather than on VC dimension. The bounds developed here improve on support vector counting bounds derived using Littlestone and Warmuth’s compression-based bounding technique. Keywords: compression, error bound, support vector machine, nearly uniform
Author: Shann-Ching Chen, Geoffrey J. Gordon, Robert F. Murphy
Abstract: In structured classification problems, there is a direct conflict between expressive models and efficient inference: while graphical models such as Markov random fields or factor graphs can represent arbitrary dependences among instance labels, the cost of inference via belief propagation in these models grows rapidly as the graph structure becomes more complicated. One important source of complexity in belief propagation is the need to marginalize large factors to compute messages. This operation takes time exponential in the number of variables in the factor, and can limit the expressiveness of the models we can use. In this paper, we study a new class of potential functions, which we call decomposable k-way potentials, and provide efficient algorithms for computing messages from these potentials during belief propagation. We believe these new potentials provide a good balance between expressive power and efficient inference in practical structured classification problems. We discuss three instances of decomposable potentials: the associative Markov network potential, the nested junction tree, and a new type of potential which we call the voting potential. We use these potentials to classify images of protein subcellular location patterns in groups of cells. Classifying subcellular location patterns can help us answer many important questions in computational biology, including questions about how various treatments affect the synthesis and behavior of proteins and networks of proteins within a cell. Our new representation and algorithm lead to substantial improvements in both inference speed and classification accuracy. Keywords: factor graphs, approximate inference algorithms, structured classification, protein subcellular location patterns, location proteomics
4 0.32959464 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees
Author: Amit Dhurandhar, Alin Dobra
Abstract: In this paper we use the methodology introduced by Dhurandhar and Dobra (2009) for analyzing the error of classifiers and the model selection measures, to analyze decision tree algorithms. The methodology consists of obtaining parametric expressions for the moments of the generalization error (GE) for the classification model of interest, followed by plotting these expressions for interpretability. The major challenge in applying the methodology to decision trees, the main theme of this work, is customizing the generic expressions for the moments of GE to this particular classification algorithm. The specific contributions we make in this paper are: (a) we primarily characterize a subclass of decision trees namely, Random decision trees, (b) we discuss how the analysis extends to other decision tree algorithms and (c) in order to extend the analysis to certain model selection measures, we generalize the relationships between the moments of GE and moments of the model selection measures given in (Dhurandhar and Dobra, 2009) to randomized classification algorithms. An empirical comparison of the proposed method with Monte Carlo and distribution free bounds obtained using Breiman’s formula, depicts the advantages of the method in terms of running time and accuracy. It thus showcases the use of the deployed methodology as an exploratory tool to study learning algorithms. Keywords: moments, generalization error, decision trees
5 0.28881311 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers
Author: Vojtěch Franc, Bogdan Savchynskyy
Abstract: The max-sum classifier predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions defined over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classifier. Learning parameters of the max-sum classifier is a challenging problem because even computing the response of such classifier is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classifiers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classifiers, have been extended for learning some subclasses of the max-sum classifier. Besides the max-sum classifiers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulfill some additional constraints. In this article, we extend the discriminative approach to other three classes of max-sum classifiers with an arbitrary neighbourhood structure. We derive learning algorithms for two subclasses of max-sum classifiers whose response can be computed in polynomial time: (i) the max-sum classifiers with supermodular quality functions and (ii) the max-sum classifiers whose response can be computed exactly by a linear programming relaxation. Moreover, we show that the learning problem can be approximately solved even for a general max-sum classifier. Keywords: max-xum classifier, hidden Markov networks, support vector machines
6 0.27762103 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
7 0.25722858 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
8 0.24352458 9 jmlr-2008-Active Learning by Spherical Subdivision
9 0.23641632 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
10 0.20154165 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
11 0.19729583 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons
12 0.19062182 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss
13 0.18060251 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
14 0.17458907 52 jmlr-2008-Learning from Multiple Sources
15 0.17386837 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
16 0.16876985 92 jmlr-2008-Universal Multi-Task Kernels
17 0.16847068 87 jmlr-2008-Stationary Features and Cat Detection
18 0.15904523 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
19 0.15551782 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)
20 0.1547721 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting
topicId topicWeight
[(0, 0.02), (5, 0.011), (40, 0.015), (54, 0.029), (58, 0.023), (66, 0.041), (76, 0.013), (88, 0.669), (92, 0.039), (94, 0.021), (99, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99415565 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers
Author: Gérard Biau, Luc Devroye, Gábor Lugosi
Abstract: In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent. Keywords: random forests, classification trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman.
2 0.97859031 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
Author: Sungwook Yoon, Alan Fern, Robert Givan
Abstract: A number of today’s state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to find domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-specific knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that it is challenging to specify a knowledge representation for compactly representing important concepts across a wide range of domains. One of the main contributions of this work is to introduce a novel feature space for representing such control knowledge. The key idea is to define features in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners. This gives a new way of leveraging relaxed planning techniques in the context of learning. Using this feature space, we describe three forms of control knowledge—reactive policies (decision list rules and measures of progress) and linear heuristics—and show how to learn them and incorporate them into forward state-space search. Our empirical results show that our approaches are able to surpass state-of-the-art nonlearning planners across a wide range of planning competition domains. Keywords: planning, machine learning, knowledge representation, search
3 0.95620722 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine
Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park
Abstract: The support vector machine has been successful in a variety of applications. Also on the theoretical front, statistical properties of the support vector machine have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the support vector machine yet to be investigated, namely the asymptotic behavior of the coefficients of the linear support vector machine. A Bahadur type representation of the coefficients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the support vector machine, but also they can be useful for related statistical inferences. Keywords: asymptotic normality, Bahadur representation, classification, convexity lemma, Radon transform
4 0.91936934 96 jmlr-2008-Visualizing Data using t-SNE
Author: Laurens van der Maaten, Geoffrey Hinton
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets. Keywords: visualization, dimensionality reduction, manifold learning, embedding algorithms, multidimensional scaling
5 0.69871259 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression
Author: Manu Chhabra, Robert A. Jacobs
Abstract: The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefficients of a linear combination of sequences minimizing a cost function. From the perspective of numerical optimization, the GAR model is interesting because it creates a library of “local features”—each sequence in the library is a solution to a single training task—and learns to combine these sequences using a local optimization procedure, namely, additive regression. We speculate that learners with local representational primitives and local optimization procedures will show good performance on nonlinear tasks. The GAR model is also interesting from the perspective of motor control because it outperforms several competing models. Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. The library is also robust in the sense that, after an initial training period, nearly all novel movements can be learned as additive combinations of sequences in the library, and in the sense that it shows good generalization when an arm’s dynamics are altered between training and test conditions, such as when a payload is added to the arm. Lastly, the GAR model works well regardless of whether motor tasks are specified in joint space or Cartesian space. We conclude that learning techniques using local primitives and optimization procedures are viable and potentially important methods for motor control and possibly other domains, and that these techniques deserve further examination by the artificial intelligence and cognitive science
6 0.65951127 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
7 0.64474571 9 jmlr-2008-Active Learning by Spherical Subdivision
8 0.63724744 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers
9 0.63458079 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss
10 0.62933135 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
11 0.6221056 57 jmlr-2008-Manifold Learning: The Price of Normalization
12 0.6082015 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
13 0.59895766 19 jmlr-2008-Bouligand Derivatives and Robustness of Support Vector Machines for Regression
14 0.59244078 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses
15 0.58471495 50 jmlr-2008-Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2
16 0.58063817 71 jmlr-2008-On the Equivalence of Linear Dimensionality-Reducing Transformations
17 0.57608289 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
19 0.57175928 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons
20 0.57106167 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks