nips nips2001 nips2001-81 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: T. Zhang
Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We investigate the generalization performance of some learning problems in Hilbert functional Spaces. [sent-5, score-0.257]
2 We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. [sent-6, score-0.806]
3 This estimate allows us to derive generalization bounds on some learning formulations. [sent-7, score-0.21]
4 1 Introduction In machine learning, our goal is often to predict an unobserved output value based on an observed input vector . [sent-8, score-0.105]
5 This requires us to estimate a functional relationship from a set of example pairs of . [sent-9, score-0.129]
6 Usually the quality of the predictor can be that is problem dependent. [sent-10, score-0.367]
7 In machine learning, measured by a loss function we assume that the data are drawn from an underlying distribution which is not so that the expected true loss of given below is as small known. [sent-11, score-0.438]
8 §¤¦ ¨ ¦ #¨ ¦ ¨ ¦ ¡ ¤ ¤ ¨ ¡ ¨ ¡ ¦ ¨ ¨ §§¦¦¡ ¤¦ ¡ In order to estimate a good predictor from a set of training data randomly drawn from , it is necessary to start with a model of the functional relationship. [sent-13, score-0.535]
9 In this paper, we consider models that are subsets in some Hilbert functional spaces . [sent-14, score-0.17]
10 Denote by the norm in , we consider models in the set , where is a parameter that can be used to control the size of the underlying model family. [sent-15, score-0.051]
11 We would like to find the best model in which is given by: B Y (1) By introducing a non-negative Lagrangian multiplier problem as: , we may rewrite the above (2) We shall only consider this equivalent formulation in this paper. [sent-16, score-0.204]
12 In addition, for technical reasons, we also assume that is a convex function of . [sent-17, score-0.151]
13 ¦ £ b ¤ ¦ £ ¥ i uhs f e d # § F P9§ ©¡ ¨ ¨ £ ¦ b ¤ ¤¡ ¦ 8VV¡ u¢¡ ¦ £ AAA¨ ¡ , we consider the following estimation : A D V¨ ! [sent-19, score-0.154]
14 ¤ C C ¦ b¤ Given training examples method to approximate the optimal predictor (3) The goal of this paper is to show that as , in probability under appropriate regularity conditions. [sent-20, score-0.507]
15 Furthermore, we obtain an estimate on the rate of convergence. [sent-21, score-0.11]
16 Consequences of this result in some specific learning formulations are examined. [sent-22, score-0.136]
17 2 Convergence of the estimated predictor ¤ Assume that input belongs to a set . [sent-23, score-0.415]
18 We make the reasonable assumption that is pointwise continuous under the topology: , where is in the sense that . [sent-24, score-0.087]
19 The condition implies that each data point can be regarded as a bounded linear functional on such that : . [sent-26, score-0.411]
20 For notational simplicity, we shall defined as for all , where denotes the inner product of . [sent-28, score-0.107]
21 C H) F# C ¤ D B G B ¦ C¤ # ¨ ©¡ ¦"¦ ¡ ¨ 4 2 s 2 Ed Uf wd # C ¤ D h e B ¡ §¤ )B B B P ¨ ¦ ¤ ©¡ ¤ # " ) B B P )B ¡ B ¤ )B B ¨ # ©¡ ¦ a¨ ) B B P ¤ B )B ¡ ¤ ¤¦ AP ©@9 ¡ &VC; $ C © D D C C " ¤ %¤ #¨ ¦ c©¡ §¤ ! [sent-32, score-0.041]
22 s s rh0i P ¡ ¡ 8 ¨ 9© ( ¦¡ 6 ¤ ¡ 7542 s 31¡0¨2 )' " ¤¤ © ¤ ¦ #¤ " It is clear that can be regarded as a representing feature vector of in . [sent-33, score-0.119]
23 Since can now be considered as a linear functional using the feature space representation of , we can use the idea from [6] to analyze the convergence behavior of in . [sent-38, score-0.23]
24 Following [6], using the linear representation of , we differentiate (2) at , which leads to the following first order condition: the optimal solution (4) where is the derivative of with respect to if is smooth; it denotes a subgradient (see [4]) otherwise. [sent-39, score-0.303]
25 Since we have assumed that is a convex function . [sent-40, score-0.104]
26 This implies the following of , we know that inequality: ¡ ¡ § ¡ . [sent-41, score-0.163]
27 In (5), we have already bounded the convergence of to in terms to its of the convergence of the empirical expectation of a random vector mean. [sent-44, score-0.264]
28 For example, if its variance can be bounded, then we may use the Chebyshev inequality to obtain a probability bound. [sent-46, score-0.477]
29 In this paper, we are interested in obtaining an exponential probability bound. [sent-47, score-0.062]
30 In order to do so, similar to the analysis in [6], we use the following form of concentration inequality which can be found in [5], page 95: b¤ ) B ¨ @¨¡ ¦§b ¤ ¦ ¡Q b¤ £ § £ b¤ A¨ £ £ "¨ #4R © '3¦ G 2 S¦ 0 D W 2 VC b ¤ $ £ b ¤ C ¦ ¨ D $ ¤ ! [sent-48, score-0.436]
31 1 ([5]) Let be zero-mean independent random vectors in a Hilbert space If there exists such that for all natural numbers : Then for all : . [sent-52, score-0.049]
32 We may now use the following form of Jensen’s inequality to bound the moments of the zero-mean random vector : From inequality (5) and Theorem 2. [sent-55, score-0.866]
33 2 If there exists such that for all natural numbers . [sent-57, score-0.049]
34 2 is quite general, the quantity and on the right hand side of the bound depend on the optimal predictor which requires to be estimated. [sent-59, score-0.67]
35 In order to obtain a bound that does not require any knowledge of the true distribution , we may impose the following assumptions: both and are bounded. [sent-60, score-0.405]
36 Observe that , we obtain the following result: D B VC ) ! [sent-61, score-0.152]
37 Also assume that the loss , then : A ¨ @¨"24 £ 8 ¤ BA ¨ ¦ ¡Q Corollary 2. [sent-64, score-0.217]
38 1 compares the performance of the computed function with that of the optimal predictor in (1). [sent-66, score-0.453]
39 This style of analysis has been extensively used in the literature. [sent-67, score-0.172]
40 In order to compare with their results, we can rewrite Theorem 3. [sent-69, score-0.163]
41 1 in another form as: with probability of at least , B ¦F P b ¤ ¥ A¨ @ 8 ¦ " 8£ b ¤ C 4R§¨ @©¡ §b ¤ ¦ 3 w) % W ¨ @©¡ ¦ £ b ¤ ¦ 3 w) % DC ¨ ¦ 1 ¨ 1 G ¨ £ 8 3 G $ 0 D 5 t 0 ( W ¡ §¤ ¡ ¨ 4 2 s 2 7D us 1)' S¦ ! [sent-70, score-0.137]
42 Assume that , with probability of at least , then , we have b¤ It is clear that the right-hand side of the above inequality does not depend on the unobserved function . [sent-73, score-0.609]
43 ¡ ¤ $ ¤¦ $ ¨ u¡ ¤¦ $ ¨ ¤¦ ¤ ¡ ¤ Using this inequality and (4), we obtain: ¤ for all A £ W @$¨ ¤¦ ¡Q $ £¡ $ T ££ W $$ T¤¤ $$ ¤¢$ $ ¨ T$$ ¤ $ ¤¦ ¡ £ $ It is clear that is continuous differentiable and not hard to check that and : if if and . [sent-78, score-0.366]
44 It is also (6) # U¨ ¤¦ We consider the following type of Huber’s robust loss function: 3. [sent-79, score-0.212]
45 1 Regression We study some consequences of Corollary 2. [sent-80, score-0.065]
46 1, which bounds the convergence rate of the estimated predictor to the best predictor. [sent-81, score-0.638]
47 Note that the constant in their depends on the pseudodimension, which can be infinity for problems considered in this paper. [sent-83, score-0.04]
48 It is possible to employ their analysis using some covering number bounds for general Hilbert spaces. [sent-84, score-0.246]
49 However, such an analysis would have led to a result of the following form for our problems: A ¢ @¨ ¡¡¡ ¨ 0q § q $ 0 © ¨ ¦ 1 ¨ 1 ¦ ¦ ¨ @©¡ b ¤ ¦ 3 w) % W ¨ @©¡ ¦ £ b ¤ ¦ 3 w) % It is also interesting to compare Theorem 3. [sent-85, score-0.262]
50 1 since the right hand side includes an extra term of . [sent-88, score-0.09]
51 Using the analysis in this paper, we may obtain a similar result from (7) which leads to an average bound of the form: It is clear that the term resulted in our paper is not as good as from [7]. [sent-89, score-0.564]
52 However analysis in this paper leads to probability bounds while the leave-one-out analysis in [7] only gives average bounds. [sent-90, score-0.326]
53 It is also worth mentioning that it is possible to refine the analysis presented in this section to obtain a probability bound which when averaged, , rather than in the current analysis. [sent-91, score-0.485]
54 gives a bound with the correct term of However due to the space limitation, we shall skip this more elaborated derivation. [sent-92, score-0.446]
55 ¨ £¡ £ ¦ © ¨ £¡ £ ¦ © In addition to the above style bounds, it is also interesting to compare the generalization performance of the computed function to the empirical error of the computed function. [sent-93, score-0.275]
56 Assume that , with probability of at least § 8£ W C8 © § § Unlike Theorem 3. [sent-98, score-0.137]
57 2 contains a term which relies on the unknown optimal predictor . [sent-100, score-0.446]
58 1, we know that this term does not affect the performance of the estimated function when ¨ ¨ ¡ b ¤ ¦ ¦ ¡ ¨ £ § ¡£ b¤ compared with the performance of . [sent-102, score-0.131]
59 In order for us to compare with the bound in [1] obtained from an algorithmic stability point of view, we make the additional assumption for all . [sent-103, score-0.462]
60 Note that this assumption is also required in [1]. [sent-104, score-0.043]
61 2, we have with probability of at least 1 A ¥ £ rq ¨ 8 3 ¦8 ¦ £ £ 8 W ¨ @¨ ¡ ¦ £ b ¤ ¦ ¤ § § ¡ ¨ $ £ §¦ ¥ ¨ ¡ ¦ £ b ¤ ¦ 3 w) % ¨ 1 3. [sent-106, score-0.137]
62 Given a continu, we consider the following prediction rule: predict if , and ous model predict otherwise. [sent-108, score-0.189]
63 In fact, even in many other popular methods, such as logistic regression and support vector machines, some kind of convex formulations have to be employed. [sent-110, score-0.237]
64 In this case, denotes a subgradient rather than gradient since is non-smooth: at ; when and when . [sent-112, score-0.097]
65 As a result, the original bound in their paper was in a form equivalent to the one we cite here with replaced by . [sent-114, score-0.252]
66 Assume that , with probability of at least ¨ 42 s2 We also have with probability of at least , , we have , then C8 © ¡ § ¨$ ¥ We may obtain from Theorem 3. [sent-118, score-0.384]
67 3 the following result: with probability of at least , ¢ A 0q ¤ £ § § 8¨ 8 ¢ £ 0q ¥ ¤ 3 $ § ¨ ¡ ¦ £ b¤¦ ¡ ¨ £ §¦ ¥ © W ¨ ¡ ¦ £ b ¤ ¦ © 3 2) % ¨ 1 It is interesting to compare this result with margin percentile style bounds from VC analysis. [sent-119, score-0.632]
68 Clearly, this implies that our analysis has some advantages over VC analysis due to the fact that we directly analyze the numerical formulation of support vector classification. [sent-123, score-0.178]
69 4 Conclusion In this paper, we have introduced a notion of the convergence of the estimated predictor to the best underlying predictor for some learning problems in Hilbert spaces. [sent-124, score-0.974]
70 We derived generalization bounds for some regression and classification problems. [sent-126, score-0.254]
71 We have shown that results from our analysis compare favorably with a number of earlier studies. [sent-127, score-0.189]
72 This indicates that the concept introduced in this paper can lead to valuable insights into certain numerical formulations of learning problems. [sent-128, score-0.138]
73 The importance of convexity in learning with squared loss. [sent-141, score-0.096]
74 A leave-one-out cross validation bound for kernel methods with applications in learning. [sent-156, score-0.214]
wordName wordTfidf (topN-words)
[('predictor', 0.367), ('inequality', 0.305), ('theorem', 0.25), ('bound', 0.214), ('hilbert', 0.214), ('rq', 0.178), ('loss', 0.17), ('vc', 0.158), ('corollary', 0.131), ('functional', 0.129), ('dc', 0.124), ('hoeffding', 0.124), ('style', 0.122), ('bounds', 0.122), ('tb', 0.117), ('uhs', 0.112), ('tong', 0.111), ('obtain', 0.11), ('shall', 0.107), ('convex', 0.104), ('convergence', 0.101), ('classi', 0.099), ('subgradient', 0.097), ('formulations', 0.089), ('uts', 0.089), ('generalization', 0.088), ('condition', 0.084), ('implies', 0.078), ('least', 0.075), ('covering', 0.074), ('favorably', 0.074), ('consequences', 0.065), ('compare', 0.065), ('bounded', 0.062), ('probability', 0.062), ('princeton', 0.061), ('clear', 0.061), ('rewrite', 0.059), ('cation', 0.058), ('regarded', 0.058), ('separable', 0.058), ('led', 0.058), ('unobserved', 0.056), ('algorithmic', 0.055), ('margin', 0.053), ('squared', 0.052), ('underlying', 0.051), ('analysis', 0.05), ('side', 0.05), ('predict', 0.049), ('exists', 0.049), ('tzhang', 0.049), ('wee', 0.049), ('yorktown', 0.049), ('andr', 0.049), ('mentioning', 0.049), ('aaa', 0.049), ('chebyshev', 0.049), ('igf', 0.049), ('insights', 0.049), ('ous', 0.049), ('estimated', 0.048), ('result', 0.047), ('assume', 0.047), ('compares', 0.047), ('stability', 0.046), ('uh', 0.044), ('pointwise', 0.044), ('bousquet', 0.044), ('percentile', 0.044), ('convexity', 0.044), ('differentiate', 0.044), ('elaborated', 0.044), ('regression', 0.044), ('assumption', 0.043), ('know', 0.043), ('leads', 0.042), ('following', 0.042), ('together', 0.042), ('heights', 0.041), ('skip', 0.041), ('watson', 0.041), ('huber', 0.041), ('wd', 0.041), ('comparable', 0.041), ('spaces', 0.041), ('term', 0.04), ('problems', 0.04), ('optimal', 0.039), ('averaged', 0.039), ('regularity', 0.039), ('sun', 0.039), ('uf', 0.039), ('olivier', 0.039), ('vg', 0.039), ('nello', 0.039), ('volker', 0.039), ('order', 0.039), ('equivalent', 0.038), ('jensen', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
Author: T. Zhang
Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.
2 0.37801287 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
3 0.22489342 139 nips-2001-Online Learning with Kernels
Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson
Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡
4 0.1865111 1 nips-2001-(Not) Bounding the True Error
Author: John Langford, Rich Caruana
Abstract: We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs a distribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound. In this paper we demonstrate the method on artificial neural networks with results of a order of magnitude improvement vs. the best deterministic neural net bounds. £ ¡ ¤¢
5 0.17876296 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
Author: Nicolò Cesa-bianchi, Alex Conconi, Claudio Gentile
Abstract: In this paper we show that on-line algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary on-line learning algorithms. Furthermore, when applied to concrete on-line algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
6 0.16327584 143 nips-2001-PAC Generalization Bounds for Co-training
7 0.14562941 137 nips-2001-On the Convergence of Leveraging
8 0.14292824 119 nips-2001-Means, Correlations and Bounds
9 0.14266935 31 nips-2001-Algorithmic Luckiness
10 0.13492323 22 nips-2001-A kernel method for multi-labelled classification
11 0.12065418 62 nips-2001-Duality, Geometry, and Support Vector Regression
12 0.11695819 105 nips-2001-Kernel Machines and Boolean Functions
13 0.11221624 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
14 0.11149647 164 nips-2001-Sampling Techniques for Kernel Methods
15 0.10166986 114 nips-2001-Learning from Infinite Data in Finite Time
16 0.099584378 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
17 0.097057685 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
18 0.09544763 21 nips-2001-A Variational Approach to Learning Curves
19 0.091614142 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
20 0.090352513 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning
topicId topicWeight
[(0, -0.283), (1, 0.155), (2, 0.057), (3, 0.277), (4, 0.182), (5, -0.149), (6, 0.066), (7, 0.08), (8, -0.028), (9, -0.099), (10, -0.064), (11, 0.189), (12, -0.012), (13, 0.051), (14, -0.13), (15, 0.137), (16, -0.007), (17, -0.034), (18, -0.049), (19, -0.02), (20, 0.027), (21, 0.062), (22, 0.056), (23, -0.063), (24, -0.065), (25, 0.051), (26, -0.057), (27, 0.074), (28, 0.016), (29, 0.0), (30, 0.015), (31, -0.054), (32, 0.041), (33, 0.083), (34, 0.031), (35, -0.077), (36, 0.048), (37, -0.045), (38, 0.01), (39, 0.036), (40, -0.088), (41, -0.021), (42, -0.032), (43, -0.09), (44, -0.004), (45, 0.05), (46, -0.12), (47, 0.111), (48, -0.197), (49, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.97736448 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
Author: T. Zhang
Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.
2 0.85203302 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
3 0.67590445 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
Author: Nicolò Cesa-bianchi, Alex Conconi, Claudio Gentile
Abstract: In this paper we show that on-line algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary on-line learning algorithms. Furthermore, when applied to concrete on-line algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
4 0.66270143 143 nips-2001-PAC Generalization Bounds for Co-training
Author: Sanjoy Dasgupta, Michael L. Littman, David A. McAllester
Abstract: The rule-based bootstrapping introduced by Yarowsky, and its cotraining variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for . £ ¡ ¤¢
5 0.65459359 139 nips-2001-Online Learning with Kernels
Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson
Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡
6 0.6372503 1 nips-2001-(Not) Bounding the True Error
7 0.62764895 31 nips-2001-Algorithmic Luckiness
8 0.57476616 114 nips-2001-Learning from Infinite Data in Finite Time
9 0.55254024 62 nips-2001-Duality, Geometry, and Support Vector Regression
10 0.55195397 137 nips-2001-On the Convergence of Leveraging
11 0.52863079 119 nips-2001-Means, Correlations and Bounds
12 0.44133425 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
13 0.42847934 105 nips-2001-Kernel Machines and Boolean Functions
15 0.41707793 22 nips-2001-A kernel method for multi-labelled classification
16 0.41400328 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
17 0.41145894 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
18 0.40273562 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
19 0.38881844 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
20 0.3736344 164 nips-2001-Sampling Techniques for Kernel Methods
topicId topicWeight
[(14, 0.073), (17, 0.048), (19, 0.036), (27, 0.142), (30, 0.056), (36, 0.319), (59, 0.027), (72, 0.081), (79, 0.034), (83, 0.018), (91, 0.089)]
simIndex simValue paperId paperTitle
1 0.82882547 76 nips-2001-Fast Parameter Estimation Using Green's Functions
Author: K. Wong, F. Li
Abstract: We propose a method for the fast estimation of hyperparameters in large networks, based on the linear response relation in the cavity method, and an empirical measurement of the Green's function. Simulation results show that it is efficient and precise, when compared with cross-validation and other techniques which require matrix inversion. 1
same-paper 2 0.81491172 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
Author: T. Zhang
Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.
3 0.75101012 193 nips-2001-Unsupervised Learning of Human Motion Models
Author: Yang Song, Luis Goncalves, Pietro Perona
Abstract: This paper presents an unsupervised learning algorithm that can derive the probabilistic dependence structure of parts of an object (a moving human body in our examples) automatically from unlabeled data. The distinguished part of this work is that it is based on unlabeled data, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown. We use decomposable triangulated graphs to depict the probabilistic independence of parts, but the unsupervised technique is not limited to this type of graph. In the new approach, labeling of the data (part assignments) is taken as hidden variables and the EM algorithm is applied. A greedy algorithm is developed to select parts and to search for the optimal structure based on the differential entropy of these variables. The success of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled real image sequences.
4 0.65431517 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
5 0.57484955 143 nips-2001-PAC Generalization Bounds for Co-training
Author: Sanjoy Dasgupta, Michael L. Littman, David A. McAllester
Abstract: The rule-based bootstrapping introduced by Yarowsky, and its cotraining variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for . £ ¡ ¤¢
6 0.57294285 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
7 0.57289135 13 nips-2001-A Natural Policy Gradient
8 0.56683648 31 nips-2001-Algorithmic Luckiness
9 0.55107963 137 nips-2001-On the Convergence of Leveraging
10 0.54495668 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
11 0.54176331 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
12 0.54105568 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
13 0.53991508 134 nips-2001-On Kernel-Target Alignment
14 0.53804702 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
15 0.53732461 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
16 0.5345158 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
17 0.53443468 58 nips-2001-Covariance Kernels from Bayesian Generative Models
18 0.53369993 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
19 0.53335047 114 nips-2001-Learning from Infinite Data in Finite Time
20 0.53243202 62 nips-2001-Duality, Geometry, and Support Vector Regression