jmlr jmlr2008 jmlr2008-25 knowledge-graph by maker-knowledge-mining

25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers

Source: pdf

Author: Gérard Biau, Luc Devroye, Gábor Lugosi

Abstract: In the last years of his life, Leo Breiman promoted random forests for use in classiﬁcation. He suggested using averaging as a means of obtaining good discrimination rules. The base classiﬁers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classiﬁers, including one suggested by Breiman, are not universally consistent. Keywords: random forests, classiﬁcation trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Journal of Machine Learning Research 9 (2008) 2015-2033 Submitted 1/08; Revised 5/08; Published 9/08 Consistency of Random Forests and Other Averaging Classiﬁers G´ rard Biau e GERARD . [sent-1, score-0.046]

2 CA School of Computer Science McGill University Montreal, Canada H3A 2K6 G´ bor Lugosi a LUGOSI @ UPF. [sent-5, score-0.037]

3 ES ICREA and Department of Economics Pompeu Fabra University Ramon Trias Fargas 25-27 08005 Barcelona, Spain Editor: Peter Bartlett Abstract In the last years of his life, Leo Breiman promoted random forests for use in classiﬁcation. [sent-6, score-0.398]

4 He suggested using averaging as a means of obtaining good discrimination rules. [sent-7, score-0.061]

5 The base classiﬁers used for averaging are simple and randomized, often based on random samples from the data. [sent-8, score-0.146]

6 He left a few questions unanswered regarding the consistency of such rules. [sent-9, score-0.144]

7 In this paper, we give a number of theorems that establish the universal consistency of averaging rules. [sent-10, score-0.182]

8 We also show that some popular classiﬁers, including one suggested by Breiman, are not universally consistent. [sent-11, score-0.023]

9 Keywords: random forests, classiﬁcation trees, consistency, bagging This paper is dedicated to the memory of Leo Breiman. [sent-12, score-0.097]

10 Introduction Ensemble methods, popular in machine learning, are learning algorithms that construct a set of many individual classiﬁers (called base learners) and combine them to classify new data points by taking a weighted or unweighted vote of their predictions. [sent-14, score-0.151]

11 It is now well-known that ensembles are often much more accurate than the individual classiﬁers that make them up. [sent-15, score-0.035]

12 The success of ensemble algorithms on many benchmark data sets has raised considerable interest in understanding why such methods succeed and identifying circumstances in which they can be expected to produce good results. [sent-16, score-0.07]

13 These methods differ in the way the base learner is ﬁt and combined. [sent-17, score-0.053]

14 For example, bagging (Breiman, 1996) proceeds by generating bootstrap samples from the original data set, constructing a classiﬁer from each bootstrap sample, and voting to combine. [sent-18, score-0.281]

15 In boosting (Freund and Schapire, 1996) and arcing algorithms (Breiman, 1998) the successive classiﬁers are constructed by giving increased weight to those points that have been frequently misclassiﬁed, and the classiﬁers are combined using weighted voting. [sent-19, score-0.046]

16 On the other hand, random split selection (Dietterich, 2000) c 2008 G´ rard Biau, Luc Devroye and G´ bor Lugosi. [sent-20, score-0.149]

17 e a B IAU , D EVROYE AND L UGOSI grows trees on the original data set. [sent-21, score-0.037]

18 For a ﬁxed number S, at each node, S best splits (in terms of minimizing deviance) are found and the actual split is randomly and uniformly selected from them. [sent-22, score-0.053]

19 For a comprehensive review of ensemble methods, we refer the reader to Dietterich (2000a) and the references therein. [sent-23, score-0.051]

20 Breiman (2001) provides a general framework for tree ensembles called “random forests”. [sent-24, score-0.136]

21 Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees. [sent-25, score-0.133]

22 Thus, a random forest is a classiﬁer that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees. [sent-26, score-0.379]

23 Algorithms for inducing a random forest were ﬁrst developed by Breiman and Cutler, and “Random Forests” is their trademark. [sent-27, score-0.342]

24 edu/users/breiman/RandomForests provides a collection of downloadable technical reports, and gives an overview of random forests as well as comments on the features of the method. [sent-31, score-0.426]

25 Random forests have been shown to give excellent performance on a number of practical problems. [sent-32, score-0.347]

26 They work fast, generally exhibit a substantial performance improvement over single tree classiﬁers such as CART, and yield generalization error rates that compare favorably to the best statistical and machine learning methods. [sent-33, score-0.101]

27 In fact, random forests are among the most accurate general-purpose classiﬁers available (see, for example, Breiman, 2001). [sent-34, score-0.379]

28 Different random forests differ in how randomness is introduced in the tree building process, ranging from extreme random splitting strategies (Breiman, 2000; Cutler and Zhao, 2001) to more involved data-dependent strategies (Amit and Geman, 1997; Breiman, 2001; Dietterich, 2000). [sent-35, score-0.529]

29 As a matter of fact, the statistical mechanism of random forests is not yet fully understood and is still under active investigation. [sent-36, score-0.379]

30 Unlike single trees, where consistency is proved letting the number of ¨ observations in each terminal node become large (Devroye, Gy orﬁ, and Lugosi, 1996, Chapter 20), random forests are generally built to have a small number of cases in each terminal node. [sent-37, score-0.62]

31 Although the mechanism of random forest algorithms appears simple, it is difﬁcult to analyze and remains largely unknown. [sent-38, score-0.342]

32 Some attempts to investigate the driving force behind consistency of random forests are by Breiman (2000, 2004) and Lin and Jeon (2006), who establish a connection between random forests and adaptive nearest neighbor methods. [sent-39, score-0.922]

33 Meinshausen (2006) proved consistency of certain random forests in the context of so-called quantile regression. [sent-40, score-0.539]

34 In this paper we offer consistency theorems for various versions of random forests and other randomized ensemble classiﬁers. [sent-41, score-0.693]

35 In Section 2 we introduce a general framework for studying classiﬁers based on averaging randomized base classiﬁers. [sent-42, score-0.256]

36 We prove a simple but useful proposition showing that averaged classiﬁers are consistent whenever the base classiﬁers are. [sent-43, score-0.181]

37 In Section 3 we prove consistency of two simple random forest classiﬁers, the purely random forest (suggested by Breiman as a starting point for study) and the scale-invariant random forest classiﬁers. [sent-44, score-1.192]

38 In Section 4 it is shown that averaging may convert inconsistent rules into consistent ones. [sent-45, score-0.132]

39 In Section 5 we brieﬂy investigate consistency of bagging rules. [sent-46, score-0.206]

40 We show that, in general, bagging preserves consistency of the base rule and it may even create consistent rules from inconsistent ones. [sent-47, score-0.31]

41 In particular, we show that if the bootstrap samples are sufﬁciently small, the bagged version of the 1-nearest neighbor classiﬁer is consistent. [sent-48, score-0.079]

42 2016 C ONSISTENCY OF R ANDOM F ORESTS Finally, in Section 6 we consider random forest classiﬁers based on randomized, greedily grown tree classiﬁers. [sent-49, score-0.443]

43 We argue that some greedy random forest classiﬁers, including Breiman’s random forest classiﬁer, are inconsistent and suggest a consistent greedy random forest classiﬁer. [sent-50, score-1.097]

44 pairs of random variables such that X (the so-called feature vector) takes its values in Rd while Y (the label) is a binary {0, 1}-valued random variable. [sent-58, score-0.064]

45 A classiﬁer gn is a binary-valued function of X and Dn whose probability of error is deﬁned by L(gn ) = P(X,Y ) {gn (X, Dn ) = Y } where P(X,Y ) denotes probability with respect to the pair (X,Y ) (i. [sent-66, score-0.641]

46 A sequence {gn } of classiﬁers is consistent for a certain distribution of (X,Y ) if L(gn ) → L∗ in probability. [sent-72, score-0.033]

47 In this paper we investigate classiﬁers that calculate their decisions by taking a majority vote over randomized classiﬁers. [sent-73, score-0.328]

48 A randomized classiﬁer may use a random variable Z to calculate its decision. [sent-74, score-0.217]

49 A randomized classiﬁer is an arbitrary function of the form gn (X, Z, Dn ), which we abbreviate by gn (X, Z). [sent-76, score-1.424]

50 The probability of error of gn becomes L(gn ) = P(X,Y ),Z {gn (X, Z, Dn ) = Y } = P{gn (X, Z, Dn ) = Y |Dn } . [sent-77, score-0.641]

51 The deﬁnition of consistency remains the same by augmenting the probability space appropriately to include the randomization. [sent-78, score-0.121]

52 Given any randomized classiﬁer, one may calculate the classiﬁer for various draws of the randomizing variable Z. [sent-79, score-0.365]

53 It is then a natural idea to deﬁne an averaged classiﬁer by taking a majority vote among the obtained random classiﬁers. [sent-80, score-0.244]

54 , Zm are identically distributed draws of the randomizing variable, having the same distribution as Z. [sent-84, score-0.206]

55 , Zm are independent, conditionally on X, Y , and Dn . [sent-88, score-0.032]

56 , Zm ), one may deﬁne the corresponding voting classiﬁer by (m) gn (x, Z m , Dn ) = 1 1 if m ∑m gn (x, Z j , Dn ) ≥ j=1 0 otherwise. [sent-92, score-1.386]

57 (Here P Z and EZ denote probability and expectation with respect to the randomizing variable Z, that is, conditionally on X, Y , and Dn . [sent-94, score-0.186]

58 ) (m) gn may be interpreted as an idealized version of the classiﬁer g n that draws many independent copies of the randomizing variable Z and takes a majority vote over the resulting classiﬁers. [sent-95, score-1.004]

59 Our ﬁrst result states that consistency of a randomized classiﬁer is preserved by averaging. [sent-96, score-0.263]

60 Proposition 1 Assume that the sequence {gn } of randomized classiﬁers is consistent for a certain (m) distribution of (X,Y ). [sent-97, score-0.175]

61 Then the voting classiﬁer gn (for any value of m) and the averaged classiﬁer gn are also consistent. [sent-98, score-1.455]

62 In fact, since P{gn (X, Z) = Y |X = x} ≥ P{g∗ (X) = Y |X = x} for all x ∈ Rd , consistency of {gn } means that for µ-almost all x, P{gn (X, Z) = Y |X = x} → P{g∗ (X) = Y |X = x} = min(η(x), 1 − η(x)) . [sent-100, score-0.121]

63 ) Then P{g n (X, Z) = Y |X = x} = (2η(x) − 1)P{gn (x, Z) = 0} + 1 − η(x), and by consistency we have P{gn (x, Z) = 0} → 0. [sent-103, score-0.121]

64 (m) (m) To prove consistency of the voting classiﬁer gn , it sufﬁces to show that P{gn (x, Z m ) = 0} → 0 for µ-almost all x for which η(x) > 1/2. [sent-104, score-0.866]

65 Consistency of the averaged classiﬁer is proved by a similar argument. [sent-106, score-0.091]

66 Random Forests Random forests, introduced by Breiman, are averaged classiﬁers in the sense deﬁned in Section 2. [sent-108, score-0.069]

67 Formally, a random forest with m trees is a classiﬁer consisting of a collection of randomized base tree classiﬁers gn (x, Z1 ), . [sent-109, score-1.343]

68 , Zm are identically distributed random vectors, independent conditionally on X, Y , and Dn . [sent-115, score-0.09]

69 The randomizing variable is typically used to determine how the successive cuts are performed when building the tree such as selection of the node and the coordinate to split, as well as the position of the split. [sent-116, score-0.278]

70 The random forest classiﬁer takes a majority vote among the random tree classiﬁers. [sent-117, score-0.618]

71 If m is large, the random forest classiﬁer is well approximated by the averaged classiﬁer 2018 C ONSISTENCY OF R ANDOM F ORESTS gn (x) = {EZ gn (x,Z)≥1/2} . [sent-118, score-1.693]

72 For brevity, we state most results of this paper for the averaged classiﬁer (m) only, though by Proposition 1 various results remain true for the voting classiﬁer g n as well. [sent-119, score-0.173]

73 In this section we analyze a simple random forest already considered by Breiman (2000), which we call the purely random forest. [sent-120, score-0.419]

74 The random tree classiﬁer gn (x, Z) is constructed as follows. [sent-121, score-0.774]

75 All nodes of the tree are associated with rectangular cells such that at each step of the construction of the tree, the collection of cells associated with the leaves of the tree (i. [sent-123, score-0.337]

76 The split variable J is then selected uniformly at random from the d candidates x (1) , . [sent-128, score-0.086]

77 Finally, the selected cell is split along the randomly chosen variable at a random location, chosen according to a uniform random variable on the length of the chosen side of the selected cell. [sent-132, score-0.192]

78 The randomized classiﬁer gn (x, Z) takes a majority vote among all Yi for which the corresponding feature vector Xi falls in the same cell of the random partition as x. [sent-134, score-1.077]

79 ) The purely random forest classiﬁer is a radically simpliﬁed version of random forest classiﬁers used in practice. [sent-136, score-0.729]

80 The main simpliﬁcation lies in the fact that recursive cell splits do not depend on the labels Y1 , . [sent-137, score-0.073]

81 The next theorem mainly serves as an illustration of how the consistency problem of random forest classiﬁers may be attacked. [sent-141, score-0.463]

82 More involved versions of random forest classiﬁers are discussed in subsequent sections. [sent-142, score-0.342]

83 Then the purely random forest classiﬁer gn is consistent whenever k → ∞ and k/n → 0 as k → ∞. [sent-144, score-1.061]

84 Proof By Proposition 1 it sufﬁces to prove consistency of the randomized base tree classiﬁer g n . [sent-145, score-0.417]

85 To this end, we recall a general consistency theorem for partitioning classiﬁers proved in (Devroye, Gy¨ rﬁ, and Lugosi, 1996, Theorem 6. [sent-146, score-0.143]

86 Observe that the partition has k + 1 rectangular cells, say A 1 , . [sent-151, score-0.078]

87 Since these points are independent and identically distributed, ﬁxing the set S (but not the order of the points) and Z, the conditional probability that X falls in the i-th cell equals Ni /(n + 1). [sent-166, score-0.109]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gn', 0.641), ('forests', 0.347), ('forest', 0.31), ('dn', 0.23), ('breiman', 0.195), ('randomized', 0.142), ('randomizing', 0.134), ('consistency', 0.121), ('er', 0.116), ('ers', 0.115), ('classi', 0.106), ('voting', 0.104), ('tree', 0.101), ('vote', 0.098), ('lugosi', 0.087), ('devroye', 0.082), ('biau', 0.081), ('zm', 0.077), ('nn', 0.073), ('averaged', 0.069), ('luc', 0.068), ('bagging', 0.065), ('averaging', 0.061), ('bootstrap', 0.056), ('evroye', 0.054), ('iau', 0.054), ('mcgill', 0.054), ('orests', 0.054), ('ugosi', 0.054), ('cell', 0.054), ('base', 0.053), ('ez', 0.052), ('ensemble', 0.051), ('draws', 0.046), ('cutler', 0.046), ('rard', 0.046), ('purely', 0.045), ('majority', 0.045), ('rectangular', 0.042), ('onsistency', 0.041), ('inconsistent', 0.038), ('bor', 0.037), ('terminal', 0.037), ('trees', 0.037), ('partition', 0.036), ('ensembles', 0.035), ('gy', 0.035), ('leo', 0.035), ('split', 0.034), ('cells', 0.033), ('consistent', 0.033), ('andom', 0.033), ('random', 0.032), ('rd', 0.032), ('conditionally', 0.032), ('falling', 0.031), ('paris', 0.031), ('brevity', 0.031), ('dietterich', 0.031), ('ni', 0.03), ('falls', 0.029), ('collection', 0.027), ('identically', 0.026), ('proposition', 0.026), ('letting', 0.024), ('xn', 0.024), ('calculate', 0.023), ('ramon', 0.023), ('pierre', 0.023), ('upmc', 0.023), ('montreal', 0.023), ('arcing', 0.023), ('bagged', 0.023), ('barcelona', 0.023), ('diam', 0.023), ('driving', 0.023), ('limm', 0.023), ('unanswered', 0.023), ('universally', 0.023), ('successive', 0.023), ('proved', 0.022), ('concreteness', 0.02), ('downloadable', 0.02), ('economics', 0.02), ('idealized', 0.02), ('investigate', 0.02), ('variable', 0.02), ('splits', 0.019), ('curie', 0.019), ('marie', 0.019), ('promoted', 0.019), ('cart', 0.019), ('bo', 0.019), ('borel', 0.019), ('deviance', 0.019), ('geman', 0.019), ('succeed', 0.019), ('quantile', 0.017), ('life', 0.017), ('randomness', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers

Author: Gérard Biau, Luc Devroye, Gábor Lugosi

2 0.15334232 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

Author: Eric Bax

Abstract: This paper develops bounds on out-of-sample error rates for support vector machines (SVMs). The bounds are based on the numbers of support vectors in the SVMs rather than on VC dimension. The bounds developed here improve on support vector counting bounds derived using Littlestone and Warmuth’s compression-based bounding technique. Keywords: compression, error bound, support vector machine, nearly uniform

3 0.06794852 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees

Author: Amit Dhurandhar, Alin Dobra

Abstract: In this paper we use the methodology introduced by Dhurandhar and Dobra (2009) for analyzing the error of classiﬁers and the model selection measures, to analyze decision tree algorithms. The methodology consists of obtaining parametric expressions for the moments of the generalization error (GE) for the classiﬁcation model of interest, followed by plotting these expressions for interpretability. The major challenge in applying the methodology to decision trees, the main theme of this work, is customizing the generic expressions for the moments of GE to this particular classiﬁcation algorithm. The speciﬁc contributions we make in this paper are: (a) we primarily characterize a subclass of decision trees namely, Random decision trees, (b) we discuss how the analysis extends to other decision tree algorithms and (c) in order to extend the analysis to certain model selection measures, we generalize the relationships between the moments of GE and moments of the model selection measures given in (Dhurandhar and Dobra, 2009) to randomized classiﬁcation algorithms. An empirical comparison of the proposed method with Monte Carlo and distribution free bounds obtained using Breiman’s formula, depicts the advantages of the method in terms of running time and accuracy. It thus showcases the use of the deployed methodology as an exploratory tool to study learning algorithms. Keywords: moments, generalization error, decision trees

4 0.057420198 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

Author: Sébastien Loustau

Abstract: This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classiﬁer with no tuning parameter. It is a combination of SVM classiﬁers. Our contribution is two-fold: (1) we propose learning rates for SVM using Sobolev spaces and build a numerically realizable aggregate that converges with same rate; (2) we present practical experiments of this method of aggregation for SVM using both Sobolev spaces and Gaussian kernels. Keywords: classiﬁcation, support vector machines, learning rates, approximation, aggregation of classiﬁers

5 0.057106964 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

Author: Shann-Ching Chen, Geoffrey J. Gordon, Robert F. Murphy

Abstract: In structured classiﬁcation problems, there is a direct conﬂict between expressive models and efﬁcient inference: while graphical models such as Markov random ﬁelds or factor graphs can represent arbitrary dependences among instance labels, the cost of inference via belief propagation in these models grows rapidly as the graph structure becomes more complicated. One important source of complexity in belief propagation is the need to marginalize large factors to compute messages. This operation takes time exponential in the number of variables in the factor, and can limit the expressiveness of the models we can use. In this paper, we study a new class of potential functions, which we call decomposable k-way potentials, and provide efﬁcient algorithms for computing messages from these potentials during belief propagation. We believe these new potentials provide a good balance between expressive power and efﬁcient inference in practical structured classiﬁcation problems. We discuss three instances of decomposable potentials: the associative Markov network potential, the nested junction tree, and a new type of potential which we call the voting potential. We use these potentials to classify images of protein subcellular location patterns in groups of cells. Classifying subcellular location patterns can help us answer many important questions in computational biology, including questions about how various treatments affect the synthesis and behavior of proteins and networks of proteins within a cell. Our new representation and algorithm lead to substantial improvements in both inference speed and classiﬁcation accuracy. Keywords: factor graphs, approximate inference algorithms, structured classiﬁcation, protein subcellular location patterns, location proteomics

6 0.055232868 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

7 0.04671799 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting

8 0.042747982 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

9 0.038887717 92 jmlr-2008-Universal Multi-Task Kernels

10 0.038329139 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

11 0.037363492 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

12 0.036348544 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

13 0.035817966 52 jmlr-2008-Learning from Multiple Sources

14 0.034862831 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

15 0.032908674 87 jmlr-2008-Stationary Features and Cat Detection

16 0.031330999 9 jmlr-2008-Active Learning by Spherical Subdivision

17 0.028181577 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons

18 0.02801762 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

19 0.027829239 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

20 0.027420241 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.14), (1, -0.029), (2, 0.128), (3, -0.113), (4, -0.033), (5, -0.013), (6, 0.083), (7, -0.118), (8, 0.003), (9, -0.192), (10, 0.054), (11, -0.005), (12, -0.087), (13, 0.091), (14, -0.214), (15, 0.022), (16, 0.294), (17, -0.005), (18, -0.176), (19, 0.04), (20, -0.078), (21, -0.073), (22, 0.186), (23, -0.151), (24, 0.12), (25, -0.106), (26, -0.017), (27, 0.031), (28, -0.043), (29, 0.003), (30, 0.036), (31, -0.222), (32, -0.109), (33, 0.076), (34, 0.226), (35, 0.096), (36, 0.001), (37, 0.104), (38, 0.017), (39, -0.044), (40, 0.025), (41, -0.103), (42, -0.06), (43, -0.159), (44, 0.071), (45, -0.167), (46, -0.102), (47, 0.017), (48, 0.031), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96751708 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers

Author: Gérard Biau, Luc Devroye, Gábor Lugosi

2 0.63544101 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

Author: Eric Bax

3 0.33642811 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

Author: Shann-Ching Chen, Geoffrey J. Gordon, Robert F. Murphy

4 0.32959464 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees

Author: Amit Dhurandhar, Alin Dobra

5 0.28881311 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

Author: Vojtěch Franc, Bogdan Savchynskyy

Abstract: The max-sum classiﬁer predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions deﬁned over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classiﬁer. Learning parameters of the max-sum classiﬁer is a challenging problem because even computing the response of such classiﬁer is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classiﬁers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classiﬁers, have been extended for learning some subclasses of the max-sum classiﬁer. Besides the max-sum classiﬁers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulﬁll some additional constraints. In this article, we extend the discriminative approach to other three classes of max-sum classiﬁers with an arbitrary neighbourhood structure. We derive learning algorithms for two subclasses of max-sum classiﬁers whose response can be computed in polynomial time: (i) the max-sum classiﬁers with supermodular quality functions and (ii) the max-sum classiﬁers whose response can be computed exactly by a linear programming relaxation. Moreover, we show that the learning problem can be approximately solved even for a general max-sum classiﬁer. Keywords: max-xum classiﬁer, hidden Markov networks, support vector machines

6 0.27762103 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

7 0.25722858 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

8 0.24352458 9 jmlr-2008-Active Learning by Spherical Subdivision

9 0.23641632 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

10 0.20154165 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

11 0.19729583 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons

12 0.19062182 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss

13 0.18060251 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

14 0.17458907 52 jmlr-2008-Learning from Multiple Sources

15 0.17386837 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix

16 0.16876985 92 jmlr-2008-Universal Multi-Task Kernels

17 0.16847068 87 jmlr-2008-Stationary Features and Cat Detection

18 0.15904523 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

19 0.15551782 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

20 0.1547721 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.02), (5, 0.011), (40, 0.015), (54, 0.029), (58, 0.023), (66, 0.041), (76, 0.013), (88, 0.669), (92, 0.039), (94, 0.021), (99, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99415565 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers

Author: Gérard Biau, Luc Devroye, Gábor Lugosi

2 0.97859031 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

Author: Sungwook Yoon, Alan Fern, Robert Givan

Abstract: A number of today’s state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to ﬁnd domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-speciﬁc knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that it is challenging to specify a knowledge representation for compactly representing important concepts across a wide range of domains. One of the main contributions of this work is to introduce a novel feature space for representing such control knowledge. The key idea is to deﬁne features in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners. This gives a new way of leveraging relaxed planning techniques in the context of learning. Using this feature space, we describe three forms of control knowledge—reactive policies (decision list rules and measures of progress) and linear heuristics—and show how to learn them and incorporate them into forward state-space search. Our empirical results show that our approaches are able to surpass state-of-the-art nonlearning planners across a wide range of planning competition domains. Keywords: planning, machine learning, knowledge representation, search

3 0.95620722 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine

Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park

Abstract: The support vector machine has been successful in a variety of applications. Also on the theoretical front, statistical properties of the support vector machine have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the support vector machine yet to be investigated, namely the asymptotic behavior of the coefﬁcients of the linear support vector machine. A Bahadur type representation of the coefﬁcients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the support vector machine, but also they can be useful for related statistical inferences. Keywords: asymptotic normality, Bahadur representation, classiﬁcation, convexity lemma, Radon transform

4 0.91936934 96 jmlr-2008-Visualizing Data using t-SNE

Author: Laurens van der Maaten, Geoffrey Hinton

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces signiﬁcantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to inﬂuence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are signiﬁcantly better than those produced by the other techniques on almost all of the data sets. Keywords: visualization, dimensionality reduction, manifold learning, embedding algorithms, multidimensional scaling

5 0.69871259 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

Author: Manu Chhabra, Robert A. Jacobs

Abstract: The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefﬁcients of a linear combination of sequences minimizing a cost function. From the perspective of numerical optimization, the GAR model is interesting because it creates a library of “local features”—each sequence in the library is a solution to a single training task—and learns to combine these sequences using a local optimization procedure, namely, additive regression. We speculate that learners with local representational primitives and local optimization procedures will show good performance on nonlinear tasks. The GAR model is also interesting from the perspective of motor control because it outperforms several competing models. Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. The library is also robust in the sense that, after an initial training period, nearly all novel movements can be learned as additive combinations of sequences in the library, and in the sense that it shows good generalization when an arm’s dynamics are altered between training and test conditions, such as when a payload is added to the arm. Lastly, the GAR model works well regardless of whether motor tasks are speciﬁed in joint space or Cartesian space. We conclude that learning techniques using local primitives and optimization procedures are viable and potentially important methods for motor control and possibly other domains, and that these techniques deserve further examination by the artiﬁcial intelligence and cognitive science

6 0.65951127 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

7 0.64474571 9 jmlr-2008-Active Learning by Spherical Subdivision

8 0.63724744 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

9 0.63458079 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss

10 0.62933135 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

11 0.6221056 57 jmlr-2008-Manifold Learning: The Price of Normalization

12 0.6082015 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

13 0.59895766 19 jmlr-2008-Bouligand Derivatives and Robustness of Support Vector Machines for Regression

14 0.59244078 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

15 0.58471495 50 jmlr-2008-Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2

16 0.58063817 71 jmlr-2008-On the Equivalence of Linear Dimensionality-Reducing Transformations

17 0.57608289 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

18 0.57306075 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

19 0.57175928 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons

20 0.57106167 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks