nips nips2005 nips2005-95 knowledge-graph by maker-knowledge-mining

95 nips-2005-Improved risk tail bounds for on-line algorithms

Source: pdf

Author: NicolĂ˛ Cesa-bianchi, Claudio Gentile

Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 it Abstract We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. [sent-5, score-1.194]

2 Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments. [sent-6, score-0.455]

3 1 Introduction In this paper, we analyze the risk of hypotheses selected from the ensemble obtained by running an arbitrary on-line learning algorithm on an i. [sent-7, score-0.945]

4 We describe a procedure that selects from the ensemble a hypothesis whose risk is , with high probability, at most Mn + 0 ((inn + J~n Inn) , n)2 where Mn is the average cumulative loss incurred by the on-line algorithm on a training sequence of length n. [sent-11, score-1.067]

5 Note that this bound exhibits the "fast" rate (in n)2 I n whenever the cumulative loss nMn is 0(1). [sent-12, score-0.328]

6 This result is proven through a refinement of techniques that we used in [2] to prove the substantially weaker bound Mn + 0 ( (in n) I n). [sent-13, score-0.261]

7 As in the proof of the older result , we analyze the empirical process associated with a run of the on-line learner using exponential inequalities for martingales. [sent-14, score-0.161]

8 However, this time we control the large deviations of the on-line process using Bernstein's maximal inequality rather than the Azuma-Hoeffding inequality. [sent-15, score-0.145]

9 This provides a much tighter bound on the average risk of the ensemble. [sent-16, score-0.597]

10 Finally, we relate the risk of a specific hypothesis within the ensemble to the average risk. [sent-17, score-0.955]

11 As in [2], we select this hypothesis using a deterministic sequential testing procedure, but the use of Bernstein's inequality makes the analysis of this procedure far more complicated. [sent-18, score-0.293]

12 Unlike uniform convergence, which is tailored to empirical risk minimization , our bounds hold for any learning algorithm. [sent-23, score-0.652]

13 Indeed , disregarding efficiency issues, any learner can be run incrementally on a data sequence to generate an ensemble of hypotheses. [sent-24, score-0.543]

14 Instances x are tuples of numerical and/or symbolic attributes. [sent-28, score-0.065]

15 Labels y belong to a finite set of symbols (the class elements) or to an interval of the real line, depending on whether the task is classification or regression. [sent-29, score-0.095]

16 We allow a learning algorithm to output hypotheses of the form h : X ----> D , where D is a decision space not necessarily equal to y. [sent-30, score-0.175]

17 The goodness of hypothesis h on example (x, y) is measured by the quantity C(h(x), y), where C : D x Y ----> lR. [sent-31, score-0.19]

18 2 A bound on the average risk An on-line algorithm A works in a sequence of trials. [sent-33, score-0.641]

19 the algorithm takes in input a hypothesis H t - l and an example Zt = (Xt, yt), and returns a new hypothesis H t to be used in the next trial. [sent-37, score-0.362]

20 We follow the standard assumptions in statistical learning: the sequence of examples zn = ((Xl , Yd , . [sent-38, score-0.109]

21 We also assume that the loss function C satisfies 0 ::; C ::; 1. [sent-45, score-0.149]

22 The success of a hypothesis h is measured by the risk of h, denoted by risk(h). [sent-46, score-0.588]

23 This is the expected loss of h on an example (X, Y) drawn from the underlying distribution , risk(h) = lEC(h(X), Y). [sent-47, score-0.095]

24 Define also riskernp(h) to be the empirical risk of h on a sample zn, 1 riskernp(h) = n n 2: C(h(X t ), yt) . [sent-48, score-0.503]

25 t =l Given a sample zn and an on-line algorithm A, we use Ho, HI, . [sent-49, score-0.095]

26 ,Hn - l to denote the ensemble of hypotheses generated by A. [sent-52, score-0.51]

27 Note that the ensemble is a function of the random training sample zn. [sent-53, score-0.336]

28 Our bounds hinge on the sample statistic which can be easily computed as the on-line algorithm is run on zn. [sent-54, score-0.151]

29 The following bound, a consequence of Bernstein's maximal inequality for martingales due to Freedman [4], is of primary importance for proving our results. [sent-55, score-0.217]

30 be a sequence of random variables, 0 ::; L t ::; 1. [sent-59, score-0.044]

31 Define the bounded martingale difference sequence Vi = lE[Lt ILl' . [sent-60, score-0.203]

32 ' L t - l ] - L t and the associated martingale Sn = VI + . [sent-63, score-0.125]

33 The next proposition, derived from Lemma 1, establishes a bound on the average risk of the ensemble of hypotheses. [sent-71, score-0.903]

34 ,Hn - 1 be the ensemble of hypotheses generated by an arbitrary on-line algorithm A. [sent-75, score-0.51]

35 36 IP' ( ;:;: ~ rlsk(Ht - d ::::: Mn + ~ ~ (nMn +3 ) In 5 +2 The bound shown in Proposition 2 has the same rate as a bound recently proven by Zhang [6, Theorem 5]. [sent-77, score-0.397]

36 However, rather than deriving the bound from Bernstein inequality as we do , Zhang uses an ad hoc argument. [sent-78, score-0.299]

37 Let 1 n 2: risk(Ht _d f-ln = - n and vt - l = risk(Ht _d - C(Ht-1(Xt ), yt) for t ::::: l. [sent-80, score-0.141]

38 Also, set for brevity K n = 2:~= 1 "'t, K~ = l2:~= 1 "'d, and introduce the function A (x) 2 In (X+l)}X +3) for x ::::: O. [sent-85, score-0.061]

39 We find upper and lower bounds on the probability IP' (t vt - l ::::: A(Kn) +J A(Kn) Kn) . [sent-86, score-0.32]

40 (1) The upper bound is determined through a simple stratification argument over Lemma 1. [sent-87, score-0.255]

41 s=o (s + l)(s + 3) (2) As far as the lower bound on (1) is concerned, we note that our assumption 0 ::; C ::; 1 implies "'t ::; risk(Ht_d for all t which, in tum, gives Kn ::; nf-ln. [sent-90, score-0.17]

42 Thus (1) = IP' ( nf-ln - nMn ::::: A(Kn) + J A(Kn) Kn) ::::: IP' ( nf-ln - nMn ::::: A(nf-ln) + J A(nf-ln) nf-ln) = IP' ( 2nf-ln ::::: 2nMn + 3A(nf-ln) + J4nMn A(nf-ln) + 5A(nf-lnF ) = lP' ( x::::: B+ ~A(x) + JB A(x) + ~A2(x) ) , where we set for brevity x = nf-ln and B = n Mn. [sent-91, score-0.061]

43 We would like to solve the inequality x ~ B+ ~A(x) + VB A(x) + ~A2(X) (3) W. [sent-92, score-0.101]

44 More precisely, we would like to find a suitable upper bound on the (unique) x* such that the above is satisfied as an equality. [sent-96, score-0.263]

45 A (tedious) derivative argument along with the upper bound A(x) ::; 4 In x' = (X! [sent-97, score-0.255]

46 Thus x' is an upper bound on x* , and we conclude that which, recalling the definitions of x and B, and combining with (2), proves the bound. [sent-99, score-0.248]

47 D 3 Selecting a good hypothesis from the ensemble If the decision space D of A is a convex set and the loss function ÂŁ is convex in its first argument, then via Jensen's inequality we can directly apply the bound of Proposition 2 to the risk of the average hypothesis H = ~ L ~=I H t - I . [sent-100, score-1.392]

48 (4) Observe that this is a O(l/n) bound whenever the cumulative loss n Mn is 0(1). [sent-102, score-0.328]

49 If the convexity hypotheses do not hold (as in the case of classification problems), then the bound in (4) applies to a hypothesis randomly drawn from the ensemble (this was investigated in [1] though with different goals). [sent-103, score-0.936]

50 In this section we show how to deterministically pick from the ensemble a hypothesis whose risk is close to the average ensemble risk. [sent-104, score-1.2]

51 r5 Let riskemp(Ht , t + 1) hypothesis H t , where + Er5 (riskemp(Ht, t + 1), t) 1 riskemp(Ht , t be the penalized empirical risk of n + 1) = --- " ÂŁ(Ht(Xi), Xi) n t ~ i=t+1 is the empirical risk of H t on the remaining sample Zt+ l, . [sent-107, score-1.186]

52 , Z]1' We now analyze the performance of the learning algorithm that returns the hypothesis H minimizing the penalized risk estimate over all hypotheses in the ensemble, i. [sent-110, score-0.889]

53 (5) O::; t < b S 1, the hypothesis H satisfies lP' (risk(H) > min (risk(Ht ) O::; t < O. [sent-113, score-0.244]

54 D Conclusions and current research issues We have shown tail risk bounds for specific hypotheses selected from the ensemble generated by the run of an arbitrary on-line algorithm. [sent-119, score-1.224]

55 Proposition 2, our simplest bound, is proven via an easy application of Bernstein's maximal inequality for martingales, a quite basic result in probability theory. [sent-120, score-0.202]

56 The analysis of Theorem 4 is also centered on the same martingale inequality. [sent-121, score-0.125]

57 Also, the bound shown in Theorem 4 contains In n terms. [sent-123, score-0.17]

58 A further open problem is to prove lower bounds, even in the special case when n Mn is bounded by a constant. [sent-125, score-0.068]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('risk', 0.427), ('kn', 0.306), ('ensemble', 0.306), ('ht', 0.259), ('riskemp', 0.208), ('ip', 0.195), ('mn', 0.193), ('hypotheses', 0.175), ('bound', 0.17), ('nmn', 0.166), ('hypothesis', 0.161), ('bernstein', 0.145), ('vt', 0.141), ('martingale', 0.125), ('proposition', 0.102), ('inequality', 0.101), ('bounds', 0.086), ('italy', 0.083), ('milano', 0.083), ('riskernp', 0.083), ('satisfies', 0.083), ('universita', 0.083), ('mt', 0.076), ('lp', 0.076), ('zt', 0.073), ('tail', 0.073), ('inn', 0.072), ('martingales', 0.072), ('freedman', 0.072), ('loss', 0.066), ('classification', 0.066), ('gentile', 0.066), ('vb', 0.066), ('zn', 0.065), ('yt', 0.064), ('cumulative', 0.063), ('colt', 0.063), ('brevity', 0.061), ('specific', 0.061), ('lemma', 0.058), ('define', 0.058), ('proven', 0.057), ('incrementally', 0.053), ('find', 0.051), ('nm', 0.049), ('penalized', 0.049), ('empirical', 0.046), ('sequence', 0.044), ('var', 0.044), ('maximal', 0.044), ('learner', 0.043), ('rt', 0.043), ('argument', 0.043), ('upper', 0.042), ('xt', 0.041), ('sn', 0.04), ('returns', 0.04), ('analyze', 0.037), ('ja', 0.036), ('beating', 0.036), ('recalling', 0.036), ('tailored', 0.036), ('talk', 0.036), ('tuples', 0.036), ('va', 0.036), ('zl', 0.036), ('vi', 0.036), ('run', 0.035), ('bounded', 0.034), ('prove', 0.034), ('zhang', 0.033), ('ill', 0.033), ('jb', 0.033), ('ha', 0.033), ('devroye', 0.033), ('conconi', 0.033), ('littlestone', 0.033), ('issues', 0.032), ('sequential', 0.031), ('initiated', 0.031), ('sharply', 0.031), ('kalai', 0.031), ('disregarding', 0.031), ('lt', 0.031), ('efficiency', 0.031), ('ho', 0.031), ('sample', 0.03), ('generated', 0.029), ('drawn', 0.029), ('whenever', 0.029), ('hold', 0.029), ('symbolic', 0.029), ('goodness', 0.029), ('tedious', 0.029), ('finite', 0.029), ('uniform', 0.028), ('verlag', 0.028), ('kt', 0.028), ('vx', 0.028), ('hoc', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 95 nips-2005-Improved risk tail bounds for on-line algorithms

Author: NicolĂ˛ Cesa-bianchi, Claudio Gentile

2 0.14814641 54 nips-2005-Data-Driven Online to Batch Conversions

Author: Ofer Dekel, Yoram Singer

Abstract: Online learning algorithms are typically fast, memory efﬁcient, and simple to implement. However, many common learning problems ﬁt more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing online algorithm. We ﬁrst give a uniﬁed overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions. Our conversions ﬁnd hypotheses with a small risk by explicitly minimizing datadependent generalization bounds. We experimentally demonstrate the usefulness of our approach and in particular show that the data-driven conversions consistently outperform the data-independent conversions.

3 0.14058192 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

Author: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis

Abstract: We consider the problem of constructing an aggregated estimator from a ﬁnite class of base functions which approximately minimizes a convex risk functional under the ℓ1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with speciﬁc weights. Mirror descent algorithms have been developed in different contexts and they are known to be particularly efﬁcient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error. 1

4 0.13680424 41 nips-2005-Coarse sample complexity bounds for active learning

Author: Sanjoy Dasgupta

Abstract: We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the speciﬁc target hypothesis, and the desired accuracy.

5 0.13420558 47 nips-2005-Consistency of one-class SVM and related algorithms

Author: Régis Vert, Jean-philippe Vert

Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to inﬁnity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held ﬁxed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classiﬁcation error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the ﬁrst time to be a consistent density level set estimator. 1

6 0.1281738 58 nips-2005-Divergences, surrogate loss functions and experimental design

7 0.11043728 85 nips-2005-Generalization to Unseen Cases

8 0.10845879 70 nips-2005-Fast Information Value for Graphical Models

9 0.10708802 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

10 0.10580397 50 nips-2005-Convex Neural Networks

11 0.097612403 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

12 0.091390952 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

13 0.086588688 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

14 0.074194163 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

15 0.074056193 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

16 0.067679338 45 nips-2005-Conditional Visual Tracking in Kernel Space

17 0.06761732 112 nips-2005-Learning Minimum Volume Sets

18 0.067271158 117 nips-2005-Learning from Data of Variable Quality

19 0.063796893 76 nips-2005-From Batch to Transductive Online Learning

20 0.062529102 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.172), (1, 0.088), (2, 0.01), (3, -0.134), (4, 0.187), (5, 0.123), (6, -0.09), (7, 0.042), (8, -0.185), (9, -0.029), (10, -0.093), (11, 0.216), (12, -0.046), (13, -0.079), (14, 0.055), (15, -0.048), (16, -0.03), (17, -0.155), (18, -0.086), (19, -0.016), (20, -0.089), (21, 0.006), (22, 0.003), (23, -0.092), (24, 0.107), (25, 0.059), (26, -0.156), (27, 0.098), (28, -0.039), (29, 0.037), (30, -0.045), (31, 0.016), (32, -0.058), (33, -0.046), (34, -0.015), (35, 0.08), (36, -0.0), (37, -0.039), (38, 0.027), (39, 0.046), (40, 0.068), (41, 0.001), (42, 0.159), (43, 0.025), (44, -0.036), (45, -0.016), (46, -0.0), (47, 0.017), (48, 0.057), (49, -0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97506082 95 nips-2005-Improved risk tail bounds for on-line algorithms

Author: NicolĂ˛ Cesa-bianchi, Claudio Gentile

2 0.66866606 54 nips-2005-Data-Driven Online to Batch Conversions

Author: Ofer Dekel, Yoram Singer

3 0.65704143 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

Author: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis

4 0.5591656 76 nips-2005-From Batch to Transductive Online Learning

Author: Sham Kakade, Adam Tauman Kalai

Abstract: It is well-known that everything that is learnable in the difﬁcult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. We give an efﬁcient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efﬁciently learnable in a batch model and a transductive online model.

5 0.55616862 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: The Perceptron algorithm, despite its simplicity, often performs well on online classiﬁcation tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difﬁculty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithm for kernel-based online learning on a ﬁxed memory budget. To our knowledge, this is the ﬁrst online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.

6 0.51338965 112 nips-2005-Learning Minimum Volume Sets

7 0.49500415 58 nips-2005-Divergences, surrogate loss functions and experimental design

8 0.45440522 85 nips-2005-Generalization to Unseen Cases

9 0.45224592 47 nips-2005-Consistency of one-class SVM and related algorithms

10 0.4260723 117 nips-2005-Learning from Data of Variable Quality

11 0.38890597 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

12 0.36769679 50 nips-2005-Convex Neural Networks

13 0.36739725 41 nips-2005-Coarse sample complexity bounds for active learning

14 0.36328653 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

15 0.36145884 70 nips-2005-Fast Information Value for Graphical Models

16 0.35585144 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

17 0.32695729 205 nips-2005-Worst-Case Bounds for Gaussian Process Models

18 0.30795947 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

19 0.30788651 160 nips-2005-Query by Committee Made Real

20 0.3055585 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.088), (10, 0.035), (22, 0.358), (27, 0.014), (31, 0.041), (34, 0.094), (55, 0.036), (69, 0.051), (73, 0.026), (88, 0.073), (91, 0.095)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81234825 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers

Author: Juan J. Murillo-fuentes, Sebastian Caro, Fernando Pérez-Cruz

Abstract: In this paper we propose a new receiver for digital communications. We focus on the application of Gaussian Processes (GPs) to the multiuser detection (MUD) in code division multiple access (CDMA) systems to solve the near-far problem. Hence, we aim to reduce the interference from other users sharing the same frequency band. While usual approaches minimize the mean square error (MMSE) to linearly retrieve the user of interest, we exploit the same criteria but in the design of a nonlinear MUD. Since the optimal solution is known to be nonlinear, the performance of this novel method clearly improves that of the MMSE detectors. Furthermore, the GP based MUD achieves excellent interference suppression even for short training sequences. We also include some experiments to illustrate that other nonlinear detectors such as those based on Support Vector Machines (SVMs) exhibit a worse performance. 1

same-paper 2 0.79978144 95 nips-2005-Improved risk tail bounds for on-line algorithms

Author: NicolĂ˛ Cesa-bianchi, Claudio Gentile

3 0.63648856 160 nips-2005-Query by Committee Made Real

Author: Ran Gilad-bachrach, Amir Navot, Naftali Tishby

Abstract: Training a learning algorithm is a costly task. A major goal of active learning is to reduce this cost. In this paper we introduce a new algorithm, KQBC, which is capable of actively learning large scale problems by using selective sampling. The algorithm overcomes the costly sampling step of the well known Query By Committee (QBC) algorithm by projecting onto a low dimensional space. KQBC also enables the use of kernels, providing a simple way of extending QBC to the non-linear scenario. Sampling the low dimension space is done using the hit and run random walk. We demonstrate the success of this novel algorithm by applying it to both artiﬁcial and a real world problems.

4 0.47882727 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex

Author: Rory Sayres, David Ress, Kalanit Grill-spector

Abstract: The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex [1]. It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data. 1

5 0.45601606 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1

6 0.45121416 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

7 0.44941932 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

8 0.44929928 58 nips-2005-Divergences, surrogate loss functions and experimental design

9 0.44729266 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

10 0.44694021 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

11 0.4451198 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments

12 0.4448851 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

13 0.44422624 47 nips-2005-Consistency of one-class SVM and related algorithms

14 0.44386661 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

15 0.44217977 50 nips-2005-Convex Neural Networks

16 0.44096333 41 nips-2005-Coarse sample complexity bounds for active learning

17 0.44088176 54 nips-2005-Data-Driven Online to Batch Conversions

18 0.43949685 74 nips-2005-Faster Rates in Regression via Active Learning

19 0.43706414 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

20 0.43671018 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification