nips nips2005 nips2005-112 knowledge-graph by maker-knowledge-mining

112 nips-2005-Learning Minimum Volume Sets

Source: pdf

Author: Clayton Scott, Robert Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. [sent-4, score-0.165]

2 Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing conﬁdence regions. [sent-5, score-0.344]

3 This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . [sent-6, score-0.261]

4 Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. [sent-7, score-0.062]

5 We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. [sent-8, score-0.656]

6 As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. [sent-9, score-0.273]

7 Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. [sent-10, score-0.073]

8 We also demonstrate strong universal consistency and an oracle inequality. [sent-11, score-0.208]

9 Estimators based on histograms and dyadic partitions illustrate the proposed rules. [sent-12, score-0.223]

10 1 Introduction Given a probability measure P and a reference measure µ, the minimum volume set (MV-set) with mass at least 0 < α < 1 is G∗ = arg min{µ(G) : P (G) ≥ α, G measurable}. [sent-13, score-0.47]

11 α MV-sets summarize regions where the mass of P is most concentrated. [sent-14, score-0.158]

12 Applications of minimum volume sets include outlier/anomaly detection, determining highest posterior density or multivariate conﬁdence regions, tests for multimodality, and clustering. [sent-16, score-0.287]

13 In comparison to the closely related problem of density level set estimation [1, 2], the minimum volume approach seems preferable in practice because the mass α is more easily speciﬁed than a level of a density. [sent-17, score-0.424]

14 This paper considers the problem of MV-set estimation using a training sample drawn from P , which in most practical settings is the only information one has Figure 1: Gaussian mixture data, 500 samples, α = 0. [sent-19, score-0.087]

15 (Left and Middle) Minimum volume set estimates based on recursive dyadic partitions, discussed in Section 6. [sent-21, score-0.312]

16 The speciﬁcations to the estimation process are the signiﬁcance level α, the reference measure µ, and a collection of candidate sets G. [sent-24, score-0.164]

17 To our knowledge, ours is the ﬁrst work to establish ﬁnite sample bounds, an oracle inequality, and universal consistency for the MV-set estimation problem. [sent-26, score-0.276]

18 The methods proposed herein are primarily of theoretical interest, although they may be implemented eﬀeciently for certain partition-based estimators as discussed later. [sent-27, score-0.086]

19 , Xn ) be an independent and identically distributed (IID) sample drawn according to P . [sent-35, score-0.047]

20 Let G denote a subset of X , and let G be a collection of such subsets. [sent-36, score-0.059]

21 Let P denote the empirical measure n based on S: P (G) = (1/n) i=1 I(Xi ∈ G). [sent-37, score-0.081]

22 Set µ∗ = inf {µ(G) : P (G) ≥ α}, α G (1) where the inf is over all measurable sets. [sent-39, score-0.143]

23 A minimum volume set, G∗ , is a minimizer α of (1), when it exists. [sent-40, score-0.237]

24 Given α ∈ (0, 1), denote Gα = {G ∈ G : P (G) ≥ α}, the collection of all sets in G with mass at least alpha. [sent-42, score-0.221]

25 2 Minimum Volume Sets and Empirical Risk Minimization In this section we introduce a procedure inspired by the empirical risk minimization (ERM) principle for classiﬁcation. [sent-46, score-0.234]

26 In classiﬁcation, ERM selects a classiﬁer from a ﬁxed set of classiﬁers by minimizing the empirical error (risk) of a training sample. [sent-47, score-0.079]

27 Vapnik and Chervonenkis established the basic theoretical properties of ERM (see [9, 10]), and we ﬁnd similar properties in the minimum volume setting. [sent-48, score-0.271]

28 (2) We refer to the rule in (2) as MV-ERM because of the analogy with empirical risk minimization in classiﬁcation. [sent-52, score-0.218]

29 The quantity φ acts as a kind of “tolerance” by which the empirical mass estimate may deviate from the targeted value of α. [sent-53, score-0.214]

30 We say φ is a (distribution free) complexity penalty for G if and only if for all distributions P and all δ ∈ (0, 1), Pn S : sup P (G) − P (G) − φ(G, S, δ) > 0 ≤ δ. [sent-56, score-0.159]

31 G∈G Thus, φ controls the rate of uniform convergence of P (G) to P (G) for G ∈ G. [sent-57, score-0.076]

32 It is well known that the performance of ERM (for binary classiﬁcation) relative to the performance of the best classiﬁer in the given class is controlled by the uniform convergence of true to empirical probabilities. [sent-58, score-0.156]

33 If φ is a complexity penalty for G, then Pn P (GG,α ) < α − 2φ(GG,α , S, δ) or µ(GG,α ) > µG,α ≤ δ. [sent-61, score-0.132]

34 Consider the sets ΘP = {S : P (GG,α ) < α − 2φ(GG,α , S, δ)}, Θµ = {S : µ(GG,α ) > µ(GG,α )}, ΩP = S : sup P (G) − P (G) − φ(G, S, δ) > 0 . [sent-63, score-0.074]

35 The proof of this lemma (see [6] ) follows closely the proof of Lemma 1 in [7]. [sent-67, score-0.098]

36 This result may be understood by analogy with the result from classiﬁcation that says R(f ) − inf f ∈F R(f ) ≤ 2 supf ∈F |R(f ) − R(f )| (see [10], Ch. [sent-68, score-0.112]

37 Here R and R are the true and empirical risks, f is the empirical risk minimizer, and F is a set of classiﬁers. [sent-70, score-0.184]

38 Just as this result relates uniform convergence bounds to empirical risk minimization in classiﬁcation, so does Lemma 1 relate uniform convergence to the performance of MV-ERM. [sent-71, score-0.373]

39 The theorem above allows direct translation of uniform convergence results into performance guarantees for MV-ERM. [sent-72, score-0.121]

40 1 Example: VC Classes Let G be a class of sets with VC dimension V , and deﬁne φ(G, S, δ) = 32 V log n + log(8/δ) . [sent-76, score-0.074]

41 n (3) By a version of the VC inequality [10], we know that φ is a complexity penalty for G, and therefore Theorem 1 applies. [sent-77, score-0.182]

42 Thus, for any ﬁxed > 0, the probability of being within of the target mass α and being less than the target volume µG,α approaches one exponentially fast as the sample size increases. [sent-82, score-0.326]

43 This result may also be used to calculate a distribution free upper bound on the sample size needed to be within a given tolerance of α and with a given conﬁdence 1 − δ. [sent-83, score-0.098]

44 In particular, the sample size will grow no faster than a polynomial in 1/ and 1/δ, paralleling results for classiﬁcation. [sent-84, score-0.069]

45 2 Example: Countable Classes Suppose G is a countable class of sets. [sent-86, score-0.059]

46 In light of the Kraft inequality for preﬁx codes, G may be deﬁned as the codelength of a codeword for G in a preﬁx code for G. [sent-88, score-0.05]

47 (4) 2n By Chernoﬀ’s bound together with the union bound, φ is a penalty for G. [sent-90, score-0.169]

48 For the MV-ERM estimate GG,α from a ﬁnite class G Pn 3 P (GG,α ) < α − or µ(GG,α ) > µG,α ≤ 2|G|e−n 2 /2 . [sent-94, score-0.048]

49 Consistency A minimum volume set estimator is consistent if its volume and mass tend to the optimal values µ∗ and α as n → ∞. [sent-95, score-0.568]

50 (Note that without the (·)+ operator, this would not be a meaningful error since one term could be negative and cause M to tend to zero, even if the other error term does not go to zero. [sent-97, score-0.095]

51 ) We are interested in MV-set estimators such that M(GG,α ) tends to zero as n → ∞. [sent-98, score-0.065]

52 If GG,α is strongly consistent for every possible distribution of X, then GG,α is strongly universally consistent. [sent-101, score-0.083]

53 Let G be ﬁxed and let φ(G, S, δ) be a penalty for G. [sent-103, score-0.137]

54 We refer to the left-hand side of (5) as the excess volume of the class G and the left-hand side of (6) as the missing mass of GG,α . [sent-105, score-0.367]

55 The upper bounds on the right-hand sides are an approximation error and a stochastic error, respectively. [sent-106, score-0.077]

56 The idea is to let G grow with n so that both errors tend to zero as n → ∞. [sent-107, score-0.092]

57 If G does not change with n, universal consistency is impossible. [sent-108, score-0.094]

58 To have both stochastic and approximation errors tend to zero, we apply MV-ERM to a class G k from a sequence of classes G 1 , G 2 , . [sent-109, score-0.121]

59 Assume the sequence of sets G and penalties φk satisfy lim inf µ(G) = µ∗ (7) α k k→∞ G∈Gα and lim sup φk (G, S, δ(n)) = o(1). [sent-116, score-0.187]

60 n→∞ G∈G k (8) α Then GG k ,α is strongly universally consistent. [sent-117, score-0.056]

61 The proof combines the Borel-Cantelli lemma and the distribution-free result of Theorem 1 with the stated assumptions. [sent-118, score-0.087]

62 Examples satisfying the hypotheses of the theorem include families of VC classes with arbitrary approximating power (e. [sent-119, score-0.069]

63 4 Structural Risk Minimization and an Oracle Inequality In the previous section the rate of convergence of the two errors to zero is determined by the choice of k = k(n), which must be chosen a priori. [sent-123, score-0.061]

64 Hence it is possible that the excess volume decays much more quickly than the missing mass, or vice versa. [sent-124, score-0.2]

65 In this section we introduce a new rule called MV-SRM, inspired by the principle of structural risk minimization (SRM) from the theory of classiﬁcation [11, 12], that automatically balances the two errors. [sent-125, score-0.21]

66 Conceptualize G as a collection of sets of varying capacities, such as a union of VC classes or a union of ﬁnite classes. [sent-131, score-0.161]

67 The MV-SRM principle selects the set GG,α = arg min µ(G) + φ(G, S, δ) : P (G) ≥ α − φ(G, S, δ) . [sent-133, score-0.066]

68 (9) G∈G Note that MV-SRM is diﬀerent from MV-ERM because it minimizes a complexity penalized volume instead of simply the volume. [sent-134, score-0.183]

69 This follows from the bound 1 − α ≤ γα · (1 − µ∗ ) on the α α mass outside the minimum volume set. [sent-137, score-0.383]

70 With probability at least 1 − δ over the training sample S, M(GG,α ) ≤ 1+ 1 γα inf G∈Gα µ(G) − µ∗ + 2φ(G, S, δ) α . [sent-140, score-0.107]

71 (10) Sketch of proof: The proof is similar in some respects to oracle inequalities for classiﬁcation. [sent-141, score-0.145]

72 In classiﬁcation both approximation and stochastic α errors are positive, whereas with MV-sets the excess volume µ(G) − µ∗ or missing α mass α − P (G) could be negative. [sent-143, score-0.387]

73 The proof considers three cases separately: (1) µ(GG,α ) ≥ µ∗ and P (GG,α ) < α, (2) µ(GG,α ) ≥ µ∗ and α α P (GG,α ) ≥ α, and (3) µ(GG,α ) < µ∗ and P (GG,α ) < α. [sent-145, score-0.05]

74 In the ﬁrst case, both α volume and mass errors are positive and the argument follows standard lines. [sent-146, score-0.301]

75 The oracle inequality says that MV-SRM performs about as well as the set chosen by an oracle to optimize the tradeoﬀ between the stochastic and approximation errors. [sent-149, score-0.332]

76 To illustrate the power of the oracle inequality, in [6] we demonstrate that MV-SRM applied to recursive dyadic partition-based estimators adapts optimally to the number of relevant features (unknown a priori). [sent-150, score-0.352]

77 5 Damping the Penalty In Theorem 1, the reader may have noticed that MV-ERM does not equitably balance the volume error with the mass error. [sent-151, score-0.305]

78 A minor modiﬁcation of MV-ERM and MV-SRM leads to a more equitable distribution of error between the volume and mass, instead of having all the error reside in the mass term. [sent-155, score-0.331]

79 The idea is simple: scale the penalty in the constraint by a damping factor ν < 1. [sent-156, score-0.142]

80 In the case of MV-SRM, the penalty in the objective function also needs to be scaled by 1 + ν. [sent-157, score-0.112]

81 Moreover, the theoretical properties of these estimators stated above are retained (the statements, omitted here, are slightly more involved [6] ). [sent-158, score-0.124]

82 Also note that the above theorem encompasses the generalized quantile estimate of [3], which corresponds to ν = 0. [sent-160, score-0.102]

83 Thus we have ﬁnite sample size guarantees for that estimator to match Polonik’s asymptotic analysis. [sent-161, score-0.099]

84 Second, choose the ﬁnal estimate by minimizing the penalized volume of the MV-ERM estimates. [sent-168, score-0.184]

85 (Right) The error of the MV-SRM estimate M(GG,α ) as a function of sample size when ν = 0. [sent-178, score-0.094]

86 To illustrate the potential improvement oﬀered by spatially adaptive partitioning methods, we consider a minimum volume set estimator based on recursive dyadic (quadsplit) partitions. [sent-188, score-0.439]

87 We employ a penalty that is additive over the cells A of the partition. [sent-189, score-0.112]

88 The precise form of the penalty φ(A) for each cell is given in [6] , but loosely speaking it is proportional to the square-root of the ratio of the empirical mass of the cell to the sample size n. [sent-190, score-0.406]

89 In this case, MV-SRM with ν = 0 is min G∈G L [µ(A) (A) + φ(A)] subject to A P (A) (A) ≥ α (11) A where G L is the collection of all partitions with dyadic cell sidelengths no smaller than 2−L and (A) = 1 if A belongs to the candidate set and (A) = 0 otherwise (see [6] for further details). [sent-191, score-0.314]

90 Although directly optimization appears formidable, an eﬃcient alternative is to consider the Lagrangian and conduct a bisection search over the Lagrange multiplier until the mass constraint is nearly achieved with equality (10 iterations is suﬃcient in practice). [sent-192, score-0.14]

91 For each iteration, minimization of the Lagrangian can be performed very rapidly using standard tree pruning techniques. [sent-193, score-0.084]

92 An experimental demonstration of the dyadic partition estimator is depicted in Figure 1. [sent-194, score-0.198]

93 In the experiments we employed a dyadic quadtree structure with L = 8 (i. [sent-195, score-0.146]

94 , cell sidelengths no smaller than 2−8 ) and pruned according to the theoretical penalty φ(A) formally deﬁned in [6] weighted by a factor of 1/30 (in practice the optimal weight could be found via cross-validation or other techniques). [sent-197, score-0.224]

95 This strategy mitigates the “blocky” structure due to the underlying dyadic partitions, and can be computed almost as rapidly as a single tree estimate (within a factor of L) due to the large amount of redundancy among trees. [sent-200, score-0.187]

96 7 Conclusions In this paper we propose two rules, MV-ERM and MV-SRM, for estimation of minimum volume sets. [sent-202, score-0.235]

97 Our theoretical analysis is made possible by relating the performance of these rules to the uniform convergence properties of the class of sets from which the estimate is taken. [sent-203, score-0.239]

98 Ours are the ﬁrst known results to feature ﬁnite sample bounds, an oracle inequality, and universal consistency. [sent-204, score-0.209]

99 Acknowledgements The authors thank Ercan Yildiz and Rebecca Willett for their assistance with the experiments involving dyadic trees. [sent-205, score-0.146]

100 Polonik, “Minimum volume sets and generalized quantile processes,” Stochastic Processes and their Applications, vol. [sent-220, score-0.222]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gg', 0.86), ('dyadic', 0.146), ('mass', 0.14), ('volume', 0.139), ('oracle', 0.114), ('penalty', 0.112), ('risk', 0.078), ('minimum', 0.075), ('vc', 0.072), ('erm', 0.071), ('estimators', 0.065), ('occam', 0.065), ('nowak', 0.065), ('minimization', 0.064), ('inf', 0.06), ('classi', 0.055), ('penalties', 0.053), ('empirical', 0.053), ('estimator', 0.052), ('inequality', 0.05), ('universal', 0.048), ('sets', 0.047), ('partitions', 0.047), ('sample', 0.047), ('consistency', 0.046), ('theorem', 0.045), ('pn', 0.044), ('rademacher', 0.043), ('scott', 0.043), ('cscott', 0.041), ('hush', 0.041), ('polonik', 0.041), ('sidelengths', 0.041), ('erent', 0.041), ('convergence', 0.039), ('excess', 0.037), ('uniform', 0.037), ('vapnik', 0.036), ('lemma', 0.036), ('razor', 0.036), ('quantile', 0.036), ('scovel', 0.036), ('reference', 0.034), ('collection', 0.034), ('countable', 0.032), ('proof', 0.031), ('di', 0.031), ('dence', 0.031), ('damping', 0.03), ('mv', 0.03), ('lugosi', 0.03), ('histograms', 0.03), ('bound', 0.029), ('structural', 0.029), ('rules', 0.029), ('universally', 0.029), ('says', 0.029), ('measure', 0.028), ('union', 0.028), ('recursive', 0.027), ('sup', 0.027), ('class', 0.027), ('cell', 0.027), ('strongly', 0.027), ('bounds', 0.026), ('density', 0.026), ('error', 0.026), ('arg', 0.026), ('stochastic', 0.025), ('let', 0.025), ('missing', 0.024), ('penalized', 0.024), ('inverting', 0.024), ('classes', 0.024), ('cation', 0.024), ('analogy', 0.023), ('minimizer', 0.023), ('pre', 0.023), ('measurable', 0.023), ('tend', 0.023), ('practice', 0.023), ('lagrangian', 0.022), ('tolerance', 0.022), ('grow', 0.022), ('errors', 0.022), ('theoretical', 0.021), ('estimate', 0.021), ('nite', 0.021), ('estimation', 0.021), ('principle', 0.021), ('complexity', 0.02), ('meaningful', 0.02), ('stated', 0.02), ('rapidly', 0.02), ('min', 0.019), ('considers', 0.019), ('summarize', 0.018), ('inspired', 0.018), ('properties', 0.018), ('con', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

2 0.12414719 47 nips-2005-Consistency of one-class SVM and related algorithms

Author: Régis Vert, Jean-philippe Vert

Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to inﬁnity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held ﬁxed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classiﬁcation error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the ﬁrst time to be a consistent density level set estimator. 1

3 0.078984857 85 nips-2005-Generalization to Unseen Cases

Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri

Abstract: We analyze classiﬁcation error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ signiﬁcantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1

4 0.073750295 58 nips-2005-Divergences, surrogate loss functions and experimental design

Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan

Abstract: In this paper, we provide a general theorem that establishes a correspondence between surrogate loss functions in classiﬁcation and the family of f -divergences. Moreover, we provide constructive procedures for determining the f -divergence induced by a given surrogate loss, and conversely for ﬁnding all surrogate loss functions that realize a given f -divergence. Next we introduce the notion of universal equivalence among loss functions and corresponding f -divergences, and provide necessary and sufﬁcient conditions for universal equivalence to hold. These ideas have applications to classiﬁcation problems that also involve a component of experiment design; in particular, we leverage our results to prove consistency of a procedure for learning a classiﬁer under decentralization requirements. 1

5 0.073745273 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

Author: Nicolas Usunier, Massih-reza Amini, Patrick Gallinari

Abstract: In this paper we propose a general framework to study the generalization properties of binary classiﬁers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classiﬁcation and some cases of ranking problems, and clariﬁes the relationship between these learning tasks. 1

6 0.071523651 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

7 0.069237672 74 nips-2005-Faster Rates in Regression via Active Learning

8 0.06761732 95 nips-2005-Improved risk tail bounds for on-line algorithms

9 0.057114366 41 nips-2005-Coarse sample complexity bounds for active learning

10 0.056194339 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

11 0.048587676 117 nips-2005-Learning from Data of Variable Quality

12 0.044980973 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration

13 0.043577932 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

14 0.04238024 50 nips-2005-Convex Neural Networks

15 0.041812744 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

16 0.041611362 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

17 0.040092152 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

18 0.039762985 78 nips-2005-From Weighted Classification to Policy Search

19 0.039308965 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

20 0.037671894 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, 0.058), (2, -0.015), (3, -0.079), (4, 0.1), (5, 0.089), (6, -0.065), (7, 0.024), (8, -0.081), (9, -0.045), (10, -0.036), (11, 0.06), (12, 0.076), (13, -0.028), (14, 0.054), (15, 0.072), (16, 0.017), (17, -0.078), (18, 0.048), (19, 0.014), (20, 0.048), (21, -0.034), (22, -0.005), (23, 0.004), (24, -0.041), (25, 0.054), (26, -0.032), (27, -0.038), (28, 0.002), (29, 0.03), (30, -0.067), (31, 0.042), (32, 0.081), (33, 0.027), (34, -0.02), (35, 0.098), (36, -0.04), (37, 0.131), (38, 0.038), (39, 0.066), (40, 0.071), (41, 0.033), (42, 0.02), (43, 0.003), (44, -0.022), (45, -0.032), (46, 0.048), (47, -0.008), (48, -0.044), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92271715 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

2 0.70501 85 nips-2005-Generalization to Unseen Cases

Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri

3 0.68241972 47 nips-2005-Consistency of one-class SVM and related algorithms

Author: Régis Vert, Jean-philippe Vert

4 0.66929245 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

Author: Nicolas Usunier, Massih-reza Amini, Patrick Gallinari

5 0.61963904 117 nips-2005-Learning from Data of Variable Quality

Author: Koby Crammer, Michael Kearns, Jennifer Wortman

Abstract: We initiate the study of learning from multiple sources of limited data, each of which may be corrupted at a different rate. We develop a complete theory of which data sources should be used for two fundamental problems: estimating the bias of a coin, and learning a classiﬁer in the presence of label noise. In both cases, efﬁcient algorithms are provided for computing the optimal subset of data. 1

6 0.61615992 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

7 0.61230576 95 nips-2005-Improved risk tail bounds for on-line algorithms

8 0.55548012 58 nips-2005-Divergences, surrogate loss functions and experimental design

9 0.51506263 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

10 0.48063189 74 nips-2005-Faster Rates in Regression via Active Learning

11 0.47093779 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration

12 0.46919149 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

13 0.45846787 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

14 0.41654569 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization

15 0.40907663 205 nips-2005-Worst-Case Bounds for Gaussian Process Models

16 0.40701538 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation

17 0.37554809 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

18 0.37466744 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis

19 0.37197018 41 nips-2005-Coarse sample complexity bounds for active learning

20 0.36810437 19 nips-2005-Active Learning for Misspecified Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.14), (5, 0.02), (10, 0.041), (27, 0.033), (31, 0.047), (34, 0.079), (37, 0.025), (41, 0.013), (50, 0.011), (55, 0.048), (57, 0.011), (58, 0.128), (69, 0.04), (73, 0.047), (77, 0.011), (80, 0.041), (88, 0.083), (91, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89647794 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

2 0.79855067 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

Author: Aurelie C. Lozano, Sanjeev R. Kulkarni, Robert E. Schapire

Abstract: We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classiﬁers resulting from a regularization achieved by restricting the 1-norm of the base classiﬁers’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter.

3 0.79182112 74 nips-2005-Faster Rates in Regression via Active Learning

Author: Rebecca Willett, Robert Nowak, Rui M. Castro

Abstract: This paper presents a rigorous statistical analysis characterizing regimes in which active learning signiﬁcantly outperforms classical passive learning. Active learning algorithms are able to make queries or select sample locations in an online fashion, depending on the results of the previous queries. In some regimes, this extra ﬂexibility leads to signiﬁcantly faster rates of error decay than those possible in classical passive learning settings. The nature of these regimes is explored by studying fundamental performance limits of active and passive learning in two illustrative nonparametric function classes. In addition to examining the theoretical potential of active learning, this paper describes a practical algorithm capable of exploiting the extra ﬂexibility of the active setting and provably improving upon the classical passive techniques. Our active learning theory and methods show promise in a number of applications, including ﬁeld estimation using wireless sensor networks and fault line detection. 1

4 0.78570986 41 nips-2005-Coarse sample complexity bounds for active learning

Author: Sanjoy Dasgupta

Abstract: We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the speciﬁc target hypothesis, and the desired accuracy.

5 0.77854323 85 nips-2005-Generalization to Unseen Cases

Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri

6 0.77711827 47 nips-2005-Consistency of one-class SVM and related algorithms

7 0.77461928 177 nips-2005-Size Regularized Cut for Data Clustering

8 0.77127659 159 nips-2005-Q-Clustering

9 0.76787591 117 nips-2005-Learning from Data of Variable Quality

10 0.76449382 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

11 0.76423335 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

12 0.76089275 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

13 0.76007116 58 nips-2005-Divergences, surrogate loss functions and experimental design

14 0.75904089 131 nips-2005-Multiple Instance Boosting for Object Detection

15 0.75850511 84 nips-2005-Generalization in Clustering with Unobserved Features

16 0.75742871 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments

17 0.75498933 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

18 0.75210565 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

19 0.74984992 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

20 0.74945003 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget