nips nips2001 nips2001-8 knowledge-graph by maker-knowledge-mining

8 nips-2001-A General Greedy Approximation Algorithm with Applications

Source: pdf

Author: T. Zhang

Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. [sent-5, score-0.424]

2 In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. [sent-6, score-0.772]

3 We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases. [sent-7, score-0.45]

4 1 Introduction The goal of machine learning is to obtain a certain input/output functional relationship from a set of training examples. [sent-8, score-0.19]

5 In order to do so, we need to start with a model of the functional relationship. [sent-9, score-0.125]

6 In practice, it is often desirable to ﬁnd the simplest model that can explain the data. [sent-10, score-0.078]

7 In addition, the philosophy of Occam’s Razor implies that the simplest solution is likely to be the best solution among all possible solutions, In this paper, we are interested in composite models that can be expressed as linear combinations of basic models. [sent-12, score-0.505]

8 In this framework, it is natural to measure the simplicity of a composite model by the number of its basic model components. [sent-13, score-0.269]

9 Since a composite model in our framework corresponds to a linear weight over the basic model space, therefore our measurement of model simplicity corresponds to the sparsity of the linear weight representation. [sent-14, score-0.593]

10 In this paper, we are interested in achieving sparsity through a greedy optimization algorithm which we propose in the next section. [sent-15, score-0.56]

11 This algorithm is closely related to a number of previous works. [sent-16, score-0.195]

12 The basic idea was originated in [5], where Jones observed that if a target vector in a Hilbert space is a convex combination of a library of basic vectors, then using with basic library vecgreedy approximation, one can achieve an error rate of tors. [sent-17, score-1.049]

13 The idea has been reﬁned in [1] to analyze the approximation property of sigmoidal functions including neural networks. [sent-18, score-0.249]

14 § © § ¥£¡ ¨¦¤¢ The above methods can be regarded as greedy sparse algorithms for functional approximation, which is the noise-free case of regression problems. [sent-19, score-0.982]

15 A similar greedy algorithm can also be used to solve general regression problems under noisy conditions [6]. [sent-20, score-0.737]

16 In addition to regression, greedy approximation can also be applied to classiﬁcation problems. [sent-21, score-0.512]

17 The resulting algorithm is closely related to boosting [2] under the additive model point of view [3]. [sent-22, score-0.568]

18 This paper shows how to generalize the method in [5, 1] for analyzing greedy algorithms (in their case, for functional approximation problems) and apply it to boosting. [sent-23, score-0.637]

19 Our method can also be used to obtain sparse kernel representations for regression problems. [sent-25, score-0.528]

20 Such a sparse representation is what support vector regression machines try to achieve. [sent-26, score-0.449]

21 In this regard, the method given in this paper complements some recently proposed greedy kernel methods for Gaussian processes such as [9, 10]. [sent-27, score-0.425]

22 The proposed greedy approximation method can also be applied to other prediction problems with different loss functions. [sent-28, score-0.739]

23 For example, in density estimation, the goal is to ﬁnd a model that has the smallest negative log-likelihood. [sent-29, score-0.13]

24 Similar approximation bounds can be directly obtained under the general framework proposed in this paper. [sent-31, score-0.322]

25 Section 2 formalizes the general class of problems considered in this paper, and proposes a greedy algorithm to solve the formulation. [sent-33, score-0.546]

26 The convergence rate of the algorithm is investigated in Section 3. [sent-34, score-0.202]

27 2 General Algorithm In machine learning, our goal is often to predict an unobserved output value based on an observed input vector . [sent-37, score-0.078]

28 This requires us to estimate a functional relationship from a set of example pairs of . [sent-38, score-0.125]

29 Usually the quality of the predictor can be measured by a loss function that is problem dependent. [sent-39, score-0.286]

30 This family of models can be regarded as additive models in statistics [4]. [sent-41, score-0.266]

31 Formally, each basic model can be regarded as a vector in a linear functional space. [sent-42, score-0.358]

32 Our problem in its most general form can thus be described as to ﬁnd a vector in the convex hull of to minimize a functional of that measures the quality of . [sent-43, score-0.642]

33 This functional of plays the role of loss function for learning problems. [sent-44, score-0.311]

34 © © ¡ More formally, we consider a linear vector space the convex hull of : to denote the set of positive integers. [sent-47, score-0.377]

35 We consider the following optimization problem on In this paper, we assume that : is a differentiable convex function on We propose the following algorithm to approximately solved (1). [sent-48, score-0.401]

36 (1) ¨ £ D " © w© §¤£¡ ¦D ¨ © © £ ¤ D " ¤¥R © a1 £ ¡ ¡ £ ( ¦ 0§ £ G 5 ¦ 4 ¢ ¦ £ § © 0 ¡ GV1 ¡ and D © ¨¡ let end that minimize © ¨¡ given for ﬁnd D For simplicity, we assume that the minimization of in Algorithm 2. [sent-51, score-0.046]

37 However due to the space limitation, we shall not consider this generalization. [sent-54, score-0.056]

38 For convenience, we introduce the following quantity R ! [sent-55, score-0.046]

39 hg © ¡ ( 7© ¡ ( In the next section, we show that under appropriate regularity conditions, as , where is computed from Algorithm 2. [sent-57, score-0.115]

40 In addition, the convergence rate can be bounded as . [sent-59, score-0.156]

41 #§ 3 Approximation bound Given any convex function , we have the following proposition, which is a direct consequence of the deﬁnition of convexity. [sent-61, score-0.343]

42 In convex analysis, The gradient can be replaced by the concept of subgradient, which we do not consider in this paper for simplicity. [sent-62, score-0.267]

43 1 Consider a convex function , we have The following lemma is the main theoretical result of the paper, which bounds the performance of each greedy solution step in Algorithm 2. [sent-65, score-0.92]

44 For all vectors E , we have ¦ © ¡ ( ¤ © ) , we have the following inequality 6© ) 7¡ 6¢ C © ¡ ( & y r q 5 5 Ar 9 @ C wDF¤£¡ ¡ ( pBi 'h" g ©C ¦ ) 7 8¤ © ¡ ( Proof. [sent-70, score-0.175]

45 hg £ C £ ¢ ) y r q £ ( q ( ¤ h¡ i ¦ © ¡ ip'g ¡ C © ¡ © C w©D§£ ¡ ¡ ( p'hg C ¦ 54 D 0 6V1 " §%@C C© ¡ C i wC §¤£¡ ¡ ( pC 'hg © ¦ C ¤ © ¡ ( ¦ © ¡ C C Now by setting the lemma. [sent-74, score-0.115]

46 1, we obtain in the above inequality, we obtain Using the above lemma and note that , it is easy to obtain the following theorem by induction. [sent-77, score-0.446]

47 1 approximately solves (1), and the rate of convergence for is given by © ¡ ( £ Q § ) ¤ ) , then we also have ¥ ¦ G § ) ¨! [sent-82, score-0.164]

48 We show that the general formulation considered in this paper includes some previous formulations as special cases. [sent-85, score-0.276]

49 1 Regression © ¡¡ ¤ so that the expected loss of 6 ¡ ¤ ©© ¡ ¦ In regression, we would like to approximate as ¡ ! [sent-88, score-0.186]

50 5 © © ¡ ¤ ¡ ( is small, where we use the squared loss for simplicity (this choice is obviously not crucial in our framework). [sent-89, score-0.24]

51 is the expectation over and , which often corresponds to the empirical distribution of pairs. [sent-90, score-0.097]

52 Given a set of basis functions with , we may consider the following regression formulation that is slightly different from (1): " §' 6 ¡ © © ¨¦ ¡ ¤ # B ¦ ¥¤ £ # £ " ! [sent-93, score-0.257]

53 ¤ where is a positive regularization parameter which is used to control the size of the weight vector . [sent-98, score-0.086]

54 The above formulation can be readily converted into (1) by considering the following set of basic vectors: ¡ 6 © ¨¦ ¡ ¤ 6 ¤ ¢ 41 0 3 ) R R # ¡ ¦ ¤ d ©1 ¦ ¨¤ £ £¦ I © %¦ ¡ ¤ §8 0 ) in Algorithm 2. [sent-99, score-0.185]

55 ¤ ¨¤ # ( ) # 0 We may start with can be bounded as This implies that the sparse solution in Algorithm 2. [sent-103, score-0.375]

56 1, represented as weight and ( ), satisﬁes the following inequality: § 6 " §%C ¡ © © ¨¦ C ¡ ¤ C # @ B ¡ ! [sent-104, score-0.047]

57 This leads to the original functional approximation results in [1, 5] and its generalization in [6]. [sent-107, score-0.265]

58 The sparse regression algorithm studied in this section can also be applied to kernel methods. [sent-108, score-0.549]

59 In this case, corresponds to the input training data space , and the basis . [sent-109, score-0.058]

60 Clearly, this corresponds to a special case of predictors are of the form (2). [sent-110, score-0.242]

61 A sparse kernel representation can be obtained easily from Algorithm 2. [sent-111, score-0.272]

62 Our sparse kernel regression formulation is related to Gaussian processes, where greedy style algorithms have also been proposed [9, 10]. [sent-113, score-0.963]

63 The bound given here is comparable to the bound given in [10] where a sparse approximation rate of the form was obtained. [sent-114, score-0.573]

64 In fact, even in many other popular methods, such as logistic regression and support vector machines, some kind of convex formulations have to be employed. [sent-118, score-0.618]

65 Although it is possible for us to analyze their formulations, in this section, we only consider the following form of loss that is closely related to Adaboost [2]: ¦ H 1 © ¨¦ ¡ ¤ ¡ I P£ £ ¦ ¦ © © © ¡ ¡ ¤ ¤ (¦¡ 3 ¥£! [sent-119, score-0.295]

66 Again, we consider a set of basis predictors , which are often called weak learners in the boosting literature. [sent-121, score-0.482]

67 We would like to ﬁnd a strong learner as a convex combination of weak learners to approximately minimize the above loss: © ¡¡ ¤ " ! [sent-122, score-0.47]

68 (4) (5) This can be written as formulation (1) with d£ ¤ ¡ 8 ¦ ¤ R I © ¨¦ ¡ ¤ ¦ 0 Using simple algebra, it is easy to verify that "§%C ¡ ¡ ¤ 5 " i ¢ 6 © ¥¦¤ G § © ¨¦ C ¡ ¤ C # @ B ¤ (¡ 3 ¦¢ ! [sent-127, score-0.115]

69 hg ¤ © © %¦ ¡ ¤ # ¦ § ¦ G ¦ £ # R x t f'y6wvur § 5 § ¤¢ ¤ ¦£! [sent-129, score-0.115]

70 1 implies that the sparse solution represented as weight and ( ), satisﬁes the following inequality: , (6) £ Q § # . [sent-135, score-0.382]

71 Now we consider the for all special situation that there exists such that (7) 0 R ©¨ " §%C ¡ ¨ ¢ Q © %¦ C ¡ ¤ C # @ B R 0 ¨ © ¨¦ ¡ ¡ £ ¤ " # " §%C ¡ ¤ ¤ 5 ¢ 6© ¤ ¨ ¢ (¦¡ 3 ¥¢ ¤ © © ¨¦ C ¡ ¤ C # @ B ¤ (¦¡ 3 ¥¢ ! [sent-137, score-0.084]

72 5 " pi 'hg This condition will be satisﬁed in the large margin linearly separable case where there exists and such that and for all data , C ¦ SQ C # R Now, under (7), we obtain from (6) that ! [sent-138, score-0.258]

73 ¥ "£ ¥ © ¦ ¨ ¤ (¦¡ 3 ¦¢ ¤ © ¨ ¤ ¤ ¡§ ¨ ¤ " §' ¡ © %¦ ¡ ¤ # B ¡ © , we can choose 6© ¥ 6 ¤ G § to obtain " ! [sent-139, score-0.065]

74 ¡ ¤ © ¢ ¥ ¥ © ¦ ¡§ 6 ¨ (¦¡ 3 ¥¢ ¤ © ¨ ¤ © ¨¦ ¡ ¤ # B ¡ © ¥ £ Q § Fix any (8) This implies that the misclassiﬁcation error rate decays exponentially. [sent-140, score-0.171]

75 The exponential decay of misclassiﬁcation error is the original motivation of Adaboost [2]. [sent-141, score-0.174]

76 Boosting was later viewed as greedy approximation in the additive model framework [3]. [sent-142, score-0.715]

77 From the learning theory perspective, the good generalization ability of boosting is related to its tendency to improve the misclassiﬁcation error under a positive margin [8]. [sent-143, score-0.41]

78 From this point of view, inequality (8) gives a much more explicit margin error bound (which decreases exponentially) than a related result in [8]. [sent-144, score-0.431]

79 In the framework of additive models, Adaboost corresponds to the exponential loss (3) analyzed in this section. [sent-145, score-0.513]

80 As pointed out in [3], other loss functions can also be used. [sent-146, score-0.186]

81 Using our analysis, we may also obtain sparse approximation bounds for these different loss functions. [sent-147, score-0.685]

82 However, it is also easy to observe that they will not lead to the exponential decay of classiﬁcation error in the separable case. [sent-148, score-0.298]

83 Although the exponential loss in (3) is attractive for separable problems due to the exponential decay of margin error, it is very sensitive to outliers in the non-separable case. [sent-149, score-0.576]

84 We shall mention that an interesting aspect of boosting is the concept of adaptive resampling or sample reweighting. [sent-150, score-0.353]

85 Although this idea has dominated the interpretation of boosting algorithms, it has been argued in [3] that adaptive resampling is only a computational by-product. [sent-151, score-0.347]

86 The idea corresponds to a Newton step approximation in the sparse greedy solution of in Algorithm 2. [sent-152, score-0.888]

87 1 under the additive model framework which we consider here. [sent-153, score-0.203]

88 Our analysis further conﬁrmed that the greedy sparse solution of an additive model in (1), rather than reweighting itself is the key component in boosting. [sent-154, score-0.825]

89 In our framework, it is also much easier to related the idea of boosting to the greedy function approximation method outlined in [1, 5]. [sent-155, score-0.854]

90 3 Mixture density estimation In mixture density estimation, the output is the probability density function of the input vector at . [sent-157, score-0.415]

91 The following negative log-likelihood is commonly used as loss function: ¡ ¦ © © ¡ ¤ ¡ ( is a probability density function. [sent-158, score-0.316]

92 ¦ © ¡¡ ¤ h ¡ © %¦ ¡ ¤ R Q © ¡¡ ¤ where Again, we consider a set of basis predictors , which are often called mixture comas a convex components. [sent-159, score-0.518]

93 We would like to ﬁnd a mixture probability density model bination of mixture components to approximately minimize the negative log-likelihood: © ¡¡ ¤ Q R 7 # ¦ £ # B " §! [sent-160, score-0.364]

94 1 can be computed 6 %¦ 6 ¤ ¡ x t 5 6 ¡ 6 f'yGwvu r x 5 x ¡ 6 ©© %¦ " ¡¡ ¤ 420 6 ©© ¡ ¡¡ " 3 1 3 1 420 ¥ ¦ £ ¥ ¡ £ ¢ ¤ ¡ ¢ ) An approximation bound can now be directly obtained from Theorem 3. [sent-165, score-0.216]

95 5 Conclusion This paper studies a formalization of a general class of prediction problems in machine learning, where the goal is to approximate the best model as a convex combination of a family of basic models. [sent-168, score-0.56]

96 The quality of the approximation can be measured by a loss function which we want to minimize. [sent-169, score-0.373]

97 We proposed a greedy algorithm to solve the problem, and we have shown that for a variety of loss functions, a convergence rate of can be achieved using a convex combination of basic models. [sent-170, score-1.184]

98 We have illustrated the consequence of this general algorithm in regression, classiﬁcation and density estimation, and related the resulting algorithms to previous methods. [sent-171, score-0.284]

99 Universal approximation bounds for superpositions of a sigmoidal function. [sent-175, score-0.274]

100 A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. [sent-201, score-0.914]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('greedy', 0.372), ('convex', 0.267), ('boosting', 0.23), ('sparse', 0.219), ('regression', 0.191), ('loss', 0.186), ('inequality', 0.175), ('lemma', 0.157), ('additive', 0.143), ('predictors', 0.142), ('approximation', 0.14), ('functional', 0.125), ('basic', 0.119), ('hg', 0.115), ('cation', 0.1), ('library', 0.097), ('spc', 0.097), ('composite', 0.096), ('adaboost', 0.091), ('misclassi', 0.091), ('density', 0.089), ('algorithm', 0.086), ('classi', 0.085), ('wee', 0.084), ('proposition', 0.079), ('formulations', 0.077), ('margin', 0.076), ('bound', 0.076), ('bounds', 0.075), ('regarded', 0.075), ('separable', 0.075), ('hull', 0.071), ('learners', 0.071), ('mixture', 0.07), ('resampling', 0.067), ('sun', 0.067), ('implies', 0.067), ('formulation', 0.066), ('exponential', 0.066), ('decay', 0.066), ('obtain', 0.065), ('tong', 0.064), ('rate', 0.062), ('related', 0.062), ('pc', 0.061), ('nd', 0.061), ('framework', 0.06), ('satis', 0.059), ('sigmoidal', 0.059), ('corresponds', 0.058), ('ar', 0.056), ('shall', 0.056), ('sparsity', 0.054), ('convergence', 0.054), ('simplicity', 0.054), ('kernel', 0.053), ('hilbert', 0.053), ('predictor', 0.053), ('sq', 0.053), ('hastie', 0.051), ('idea', 0.05), ('limitation', 0.05), ('solution', 0.049), ('easy', 0.049), ('freund', 0.048), ('interested', 0.048), ('family', 0.048), ('approximately', 0.048), ('weight', 0.047), ('closely', 0.047), ('general', 0.047), ('quality', 0.047), ('minimize', 0.046), ('robert', 0.046), ('quantity', 0.046), ('theorem', 0.045), ('peter', 0.044), ('logistic', 0.044), ('includes', 0.044), ('lee', 0.043), ('exists', 0.042), ('special', 0.042), ('error', 0.042), ('newton', 0.042), ('reweighting', 0.042), ('subgradient', 0.042), ('tzhang', 0.042), ('yorktown', 0.042), ('problems', 0.041), ('negative', 0.041), ('bounded', 0.04), ('often', 0.039), ('simplest', 0.039), ('estimation', 0.039), ('vector', 0.039), ('combination', 0.038), ('eighteenth', 0.038), ('occam', 0.038), ('philosophy', 0.038), ('yoav', 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 8 nips-2001-A General Greedy Approximation Algorithm with Applications

Author: T. Zhang

2 0.37801287 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

Author: T. Zhang

Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.

3 0.26016915 137 nips-2001-On the Convergence of Leveraging

Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth

Abstract: We give an uniﬁed convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.

4 0.2216938 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efﬁcient and leads to simple algorithms. In particular we derive update equations for classiﬁcation, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡

5 0.2044199 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models

Author: Guy Lebanon, John D. Lafferty

Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.

6 0.17969246 62 nips-2001-Duality, Geometry, and Support Vector Regression

7 0.17081137 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

8 0.16259389 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

9 0.15337233 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

10 0.15264194 167 nips-2001-Semi-supervised MarginBoost

11 0.15242842 143 nips-2001-PAC Generalization Bounds for Co-training

12 0.1490345 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

13 0.12694982 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

14 0.12661752 105 nips-2001-Kernel Machines and Boolean Functions

15 0.12660585 22 nips-2001-A kernel method for multi-labelled classification

16 0.11589137 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

17 0.11514585 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

18 0.10810811 13 nips-2001-A Natural Policy Gradient

19 0.10635613 1 nips-2001-(Not) Bounding the True Error

20 0.10567056 21 nips-2001-A Variational Approach to Learning Curves

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.343), (1, 0.179), (2, 0.062), (3, 0.324), (4, 0.158), (5, -0.144), (6, 0.071), (7, 0.031), (8, 0.03), (9, -0.16), (10, 0.003), (11, 0.182), (12, -0.004), (13, 0.003), (14, -0.054), (15, -0.003), (16, -0.045), (17, -0.048), (18, -0.085), (19, 0.152), (20, 0.144), (21, -0.01), (22, -0.025), (23, 0.031), (24, 0.03), (25, 0.057), (26, -0.013), (27, 0.053), (28, -0.023), (29, 0.015), (30, 0.073), (31, -0.041), (32, 0.007), (33, 0.069), (34, 0.048), (35, -0.013), (36, 0.071), (37, -0.066), (38, 0.009), (39, 0.021), (40, -0.074), (41, -0.077), (42, -0.069), (43, -0.102), (44, -0.034), (45, -0.027), (46, -0.127), (47, 0.084), (48, -0.107), (49, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97451699 8 nips-2001-A General Greedy Approximation Algorithm with Applications

Author: T. Zhang

2 0.8415888 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

Author: T. Zhang

3 0.76725054 137 nips-2001-On the Convergence of Leveraging

Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth

4 0.69993228 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models

Author: Guy Lebanon, John D. Lafferty

5 0.66463727 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

6 0.66165513 62 nips-2001-Duality, Geometry, and Support Vector Regression

7 0.60097903 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

8 0.59657699 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

9 0.56811816 143 nips-2001-PAC Generalization Bounds for Co-training

10 0.53814387 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

11 0.51401484 31 nips-2001-Algorithmic Luckiness

12 0.51385355 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

13 0.50258189 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

14 0.47957739 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

15 0.45392269 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

16 0.44558349 180 nips-2001-The Concave-Convex Procedure (CCCP)

17 0.43786469 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

18 0.42817891 22 nips-2001-A kernel method for multi-labelled classification

19 0.41610906 114 nips-2001-Learning from Infinite Data in Finite Time

20 0.40990236 167 nips-2001-Semi-supervised MarginBoost

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.085), (17, 0.018), (19, 0.038), (27, 0.17), (30, 0.077), (36, 0.095), (38, 0.022), (59, 0.026), (72, 0.109), (79, 0.052), (83, 0.023), (91, 0.116), (92, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91700149 8 nips-2001-A General Greedy Approximation Algorithm with Applications

Author: T. Zhang

2 0.9149729 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

Author: T. Zhang

3 0.89468628 76 nips-2001-Fast Parameter Estimation Using Green's Functions

Author: K. Wong, F. Li

Abstract: We propose a method for the fast estimation of hyperparameters in large networks, based on the linear response relation in the cavity method, and an empirical measurement of the Green's function. Simulation results show that it is efficient and precise, when compared with cross-validation and other techniques which require matrix inversion. 1

4 0.87842625 193 nips-2001-Unsupervised Learning of Human Motion Models

Author: Yang Song, Luis Goncalves, Pietro Perona

Abstract: This paper presents an unsupervised learning algorithm that can derive the probabilistic dependence structure of parts of an object (a moving human body in our examples) automatically from unlabeled data. The distinguished part of this work is that it is based on unlabeled data, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown. We use decomposable triangulated graphs to depict the probabilistic independence of parts, but the unsupervised technique is not limited to this type of graph. In the new approach, labeling of the data (part assignments) is taken as hidden variables and the EM algorithm is applied. A greedy algorithm is developed to select parts and to search for the optimal structure based on the differential entropy of these variables. The success of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled real image sequences.

5 0.85292697 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

Author: Olivier Chapelle, Bernhard Schćžšlkopf

Abstract: The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance , it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice. 1

6 0.85208589 13 nips-2001-A Natural Policy Gradient

7 0.84851378 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

8 0.84671873 137 nips-2001-On the Convergence of Leveraging

9 0.84608042 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

10 0.84032381 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules

11 0.83956534 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

12 0.83560896 58 nips-2001-Covariance Kernels from Bayesian Generative Models

13 0.83507895 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

14 0.83506942 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

15 0.83476663 143 nips-2001-PAC Generalization Bounds for Co-training

16 0.83381021 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

17 0.82975078 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

18 0.82965595 60 nips-2001-Discriminative Direction for Kernel Classifiers

19 0.82882106 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

20 0.82721686 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning