nips nips2005 nips2005-112 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. [sent-4, score-0.165]
2 Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. [sent-5, score-0.344]
3 This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . [sent-6, score-0.261]
4 Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. [sent-7, score-0.062]
5 We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. [sent-8, score-0.656]
6 As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. [sent-9, score-0.273]
7 Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. [sent-10, score-0.073]
8 We also demonstrate strong universal consistency and an oracle inequality. [sent-11, score-0.208]
9 Estimators based on histograms and dyadic partitions illustrate the proposed rules. [sent-12, score-0.223]
10 1 Introduction Given a probability measure P and a reference measure µ, the minimum volume set (MV-set) with mass at least 0 < α < 1 is G∗ = arg min{µ(G) : P (G) ≥ α, G measurable}. [sent-13, score-0.47]
11 α MV-sets summarize regions where the mass of P is most concentrated. [sent-14, score-0.158]
12 Applications of minimum volume sets include outlier/anomaly detection, determining highest posterior density or multivariate confidence regions, tests for multimodality, and clustering. [sent-16, score-0.287]
13 In comparison to the closely related problem of density level set estimation [1, 2], the minimum volume approach seems preferable in practice because the mass α is more easily specified than a level of a density. [sent-17, score-0.424]
14 This paper considers the problem of MV-set estimation using a training sample drawn from P , which in most practical settings is the only information one has Figure 1: Gaussian mixture data, 500 samples, α = 0. [sent-19, score-0.087]
15 (Left and Middle) Minimum volume set estimates based on recursive dyadic partitions, discussed in Section 6. [sent-21, score-0.312]
16 The specifications to the estimation process are the significance level α, the reference measure µ, and a collection of candidate sets G. [sent-24, score-0.164]
17 To our knowledge, ours is the first work to establish finite sample bounds, an oracle inequality, and universal consistency for the MV-set estimation problem. [sent-26, score-0.276]
18 The methods proposed herein are primarily of theoretical interest, although they may be implemented effeciently for certain partition-based estimators as discussed later. [sent-27, score-0.086]
19 , Xn ) be an independent and identically distributed (IID) sample drawn according to P . [sent-35, score-0.047]
20 Let G denote a subset of X , and let G be a collection of such subsets. [sent-36, score-0.059]
21 Let P denote the empirical measure n based on S: P (G) = (1/n) i=1 I(Xi ∈ G). [sent-37, score-0.081]
22 Set µ∗ = inf {µ(G) : P (G) ≥ α}, α G (1) where the inf is over all measurable sets. [sent-39, score-0.143]
23 A minimum volume set, G∗ , is a minimizer α of (1), when it exists. [sent-40, score-0.237]
24 Given α ∈ (0, 1), denote Gα = {G ∈ G : P (G) ≥ α}, the collection of all sets in G with mass at least alpha. [sent-42, score-0.221]
25 2 Minimum Volume Sets and Empirical Risk Minimization In this section we introduce a procedure inspired by the empirical risk minimization (ERM) principle for classification. [sent-46, score-0.234]
26 In classification, ERM selects a classifier from a fixed set of classifiers by minimizing the empirical error (risk) of a training sample. [sent-47, score-0.079]
27 Vapnik and Chervonenkis established the basic theoretical properties of ERM (see [9, 10]), and we find similar properties in the minimum volume setting. [sent-48, score-0.271]
28 (2) We refer to the rule in (2) as MV-ERM because of the analogy with empirical risk minimization in classification. [sent-52, score-0.218]
29 The quantity φ acts as a kind of “tolerance” by which the empirical mass estimate may deviate from the targeted value of α. [sent-53, score-0.214]
30 We say φ is a (distribution free) complexity penalty for G if and only if for all distributions P and all δ ∈ (0, 1), Pn S : sup P (G) − P (G) − φ(G, S, δ) > 0 ≤ δ. [sent-56, score-0.159]
31 G∈G Thus, φ controls the rate of uniform convergence of P (G) to P (G) for G ∈ G. [sent-57, score-0.076]
32 It is well known that the performance of ERM (for binary classification) relative to the performance of the best classifier in the given class is controlled by the uniform convergence of true to empirical probabilities. [sent-58, score-0.156]
33 If φ is a complexity penalty for G, then Pn P (GG,α ) < α − 2φ(GG,α , S, δ) or µ(GG,α ) > µG,α ≤ δ. [sent-61, score-0.132]
34 Consider the sets ΘP = {S : P (GG,α ) < α − 2φ(GG,α , S, δ)}, Θµ = {S : µ(GG,α ) > µ(GG,α )}, ΩP = S : sup P (G) − P (G) − φ(G, S, δ) > 0 . [sent-63, score-0.074]
35 The proof of this lemma (see [6] ) follows closely the proof of Lemma 1 in [7]. [sent-67, score-0.098]
36 This result may be understood by analogy with the result from classification that says R(f ) − inf f ∈F R(f ) ≤ 2 supf ∈F |R(f ) − R(f )| (see [10], Ch. [sent-68, score-0.112]
37 Here R and R are the true and empirical risks, f is the empirical risk minimizer, and F is a set of classifiers. [sent-70, score-0.184]
38 Just as this result relates uniform convergence bounds to empirical risk minimization in classification, so does Lemma 1 relate uniform convergence to the performance of MV-ERM. [sent-71, score-0.373]
39 The theorem above allows direct translation of uniform convergence results into performance guarantees for MV-ERM. [sent-72, score-0.121]
40 1 Example: VC Classes Let G be a class of sets with VC dimension V , and define φ(G, S, δ) = 32 V log n + log(8/δ) . [sent-76, score-0.074]
41 n (3) By a version of the VC inequality [10], we know that φ is a complexity penalty for G, and therefore Theorem 1 applies. [sent-77, score-0.182]
42 Thus, for any fixed > 0, the probability of being within of the target mass α and being less than the target volume µG,α approaches one exponentially fast as the sample size increases. [sent-82, score-0.326]
43 This result may also be used to calculate a distribution free upper bound on the sample size needed to be within a given tolerance of α and with a given confidence 1 − δ. [sent-83, score-0.098]
44 In particular, the sample size will grow no faster than a polynomial in 1/ and 1/δ, paralleling results for classification. [sent-84, score-0.069]
45 2 Example: Countable Classes Suppose G is a countable class of sets. [sent-86, score-0.059]
46 In light of the Kraft inequality for prefix codes, G may be defined as the codelength of a codeword for G in a prefix code for G. [sent-88, score-0.05]
47 (4) 2n By Chernoff’s bound together with the union bound, φ is a penalty for G. [sent-90, score-0.169]
48 For the MV-ERM estimate GG,α from a finite class G Pn 3 P (GG,α ) < α − or µ(GG,α ) > µG,α ≤ 2|G|e−n 2 /2 . [sent-94, score-0.048]
49 Consistency A minimum volume set estimator is consistent if its volume and mass tend to the optimal values µ∗ and α as n → ∞. [sent-95, score-0.568]
50 (Note that without the (·)+ operator, this would not be a meaningful error since one term could be negative and cause M to tend to zero, even if the other error term does not go to zero. [sent-97, score-0.095]
51 ) We are interested in MV-set estimators such that M(GG,α ) tends to zero as n → ∞. [sent-98, score-0.065]
52 If GG,α is strongly consistent for every possible distribution of X, then GG,α is strongly universally consistent. [sent-101, score-0.083]
53 Let G be fixed and let φ(G, S, δ) be a penalty for G. [sent-103, score-0.137]
54 We refer to the left-hand side of (5) as the excess volume of the class G and the left-hand side of (6) as the missing mass of GG,α . [sent-105, score-0.367]
55 The upper bounds on the right-hand sides are an approximation error and a stochastic error, respectively. [sent-106, score-0.077]
56 The idea is to let G grow with n so that both errors tend to zero as n → ∞. [sent-107, score-0.092]
57 If G does not change with n, universal consistency is impossible. [sent-108, score-0.094]
58 To have both stochastic and approximation errors tend to zero, we apply MV-ERM to a class G k from a sequence of classes G 1 , G 2 , . [sent-109, score-0.121]
59 Assume the sequence of sets G and penalties φk satisfy lim inf µ(G) = µ∗ (7) α k k→∞ G∈Gα and lim sup φk (G, S, δ(n)) = o(1). [sent-116, score-0.187]
60 n→∞ G∈G k (8) α Then GG k ,α is strongly universally consistent. [sent-117, score-0.056]
61 The proof combines the Borel-Cantelli lemma and the distribution-free result of Theorem 1 with the stated assumptions. [sent-118, score-0.087]
62 Examples satisfying the hypotheses of the theorem include families of VC classes with arbitrary approximating power (e. [sent-119, score-0.069]
63 4 Structural Risk Minimization and an Oracle Inequality In the previous section the rate of convergence of the two errors to zero is determined by the choice of k = k(n), which must be chosen a priori. [sent-123, score-0.061]
64 Hence it is possible that the excess volume decays much more quickly than the missing mass, or vice versa. [sent-124, score-0.2]
65 In this section we introduce a new rule called MV-SRM, inspired by the principle of structural risk minimization (SRM) from the theory of classification [11, 12], that automatically balances the two errors. [sent-125, score-0.21]
66 Conceptualize G as a collection of sets of varying capacities, such as a union of VC classes or a union of finite classes. [sent-131, score-0.161]
67 The MV-SRM principle selects the set GG,α = arg min µ(G) + φ(G, S, δ) : P (G) ≥ α − φ(G, S, δ) . [sent-133, score-0.066]
68 (9) G∈G Note that MV-SRM is different from MV-ERM because it minimizes a complexity penalized volume instead of simply the volume. [sent-134, score-0.183]
69 This follows from the bound 1 − α ≤ γα · (1 − µ∗ ) on the α α mass outside the minimum volume set. [sent-137, score-0.383]
70 With probability at least 1 − δ over the training sample S, M(GG,α ) ≤ 1+ 1 γα inf G∈Gα µ(G) − µ∗ + 2φ(G, S, δ) α . [sent-140, score-0.107]
71 (10) Sketch of proof: The proof is similar in some respects to oracle inequalities for classification. [sent-141, score-0.145]
72 In classification both approximation and stochastic α errors are positive, whereas with MV-sets the excess volume µ(G) − µ∗ or missing α mass α − P (G) could be negative. [sent-143, score-0.387]
73 The proof considers three cases separately: (1) µ(GG,α ) ≥ µ∗ and P (GG,α ) < α, (2) µ(GG,α ) ≥ µ∗ and α α P (GG,α ) ≥ α, and (3) µ(GG,α ) < µ∗ and P (GG,α ) < α. [sent-145, score-0.05]
74 In the first case, both α volume and mass errors are positive and the argument follows standard lines. [sent-146, score-0.301]
75 The oracle inequality says that MV-SRM performs about as well as the set chosen by an oracle to optimize the tradeoff between the stochastic and approximation errors. [sent-149, score-0.332]
76 To illustrate the power of the oracle inequality, in [6] we demonstrate that MV-SRM applied to recursive dyadic partition-based estimators adapts optimally to the number of relevant features (unknown a priori). [sent-150, score-0.352]
77 5 Damping the Penalty In Theorem 1, the reader may have noticed that MV-ERM does not equitably balance the volume error with the mass error. [sent-151, score-0.305]
78 A minor modification of MV-ERM and MV-SRM leads to a more equitable distribution of error between the volume and mass, instead of having all the error reside in the mass term. [sent-155, score-0.331]
79 The idea is simple: scale the penalty in the constraint by a damping factor ν < 1. [sent-156, score-0.142]
80 In the case of MV-SRM, the penalty in the objective function also needs to be scaled by 1 + ν. [sent-157, score-0.112]
81 Moreover, the theoretical properties of these estimators stated above are retained (the statements, omitted here, are slightly more involved [6] ). [sent-158, score-0.124]
82 Also note that the above theorem encompasses the generalized quantile estimate of [3], which corresponds to ν = 0. [sent-160, score-0.102]
83 Thus we have finite sample size guarantees for that estimator to match Polonik’s asymptotic analysis. [sent-161, score-0.099]
84 Second, choose the final estimate by minimizing the penalized volume of the MV-ERM estimates. [sent-168, score-0.184]
85 (Right) The error of the MV-SRM estimate M(GG,α ) as a function of sample size when ν = 0. [sent-178, score-0.094]
86 To illustrate the potential improvement offered by spatially adaptive partitioning methods, we consider a minimum volume set estimator based on recursive dyadic (quadsplit) partitions. [sent-188, score-0.439]
87 We employ a penalty that is additive over the cells A of the partition. [sent-189, score-0.112]
88 The precise form of the penalty φ(A) for each cell is given in [6] , but loosely speaking it is proportional to the square-root of the ratio of the empirical mass of the cell to the sample size n. [sent-190, score-0.406]
89 In this case, MV-SRM with ν = 0 is min G∈G L [µ(A) (A) + φ(A)] subject to A P (A) (A) ≥ α (11) A where G L is the collection of all partitions with dyadic cell sidelengths no smaller than 2−L and (A) = 1 if A belongs to the candidate set and (A) = 0 otherwise (see [6] for further details). [sent-191, score-0.314]
90 Although directly optimization appears formidable, an efficient alternative is to consider the Lagrangian and conduct a bisection search over the Lagrange multiplier until the mass constraint is nearly achieved with equality (10 iterations is sufficient in practice). [sent-192, score-0.14]
91 For each iteration, minimization of the Lagrangian can be performed very rapidly using standard tree pruning techniques. [sent-193, score-0.084]
92 An experimental demonstration of the dyadic partition estimator is depicted in Figure 1. [sent-194, score-0.198]
93 In the experiments we employed a dyadic quadtree structure with L = 8 (i. [sent-195, score-0.146]
94 , cell sidelengths no smaller than 2−8 ) and pruned according to the theoretical penalty φ(A) formally defined in [6] weighted by a factor of 1/30 (in practice the optimal weight could be found via cross-validation or other techniques). [sent-197, score-0.224]
95 This strategy mitigates the “blocky” structure due to the underlying dyadic partitions, and can be computed almost as rapidly as a single tree estimate (within a factor of L) due to the large amount of redundancy among trees. [sent-200, score-0.187]
96 7 Conclusions In this paper we propose two rules, MV-ERM and MV-SRM, for estimation of minimum volume sets. [sent-202, score-0.235]
97 Our theoretical analysis is made possible by relating the performance of these rules to the uniform convergence properties of the class of sets from which the estimate is taken. [sent-203, score-0.239]
98 Ours are the first known results to feature finite sample bounds, an oracle inequality, and universal consistency. [sent-204, score-0.209]
99 Acknowledgements The authors thank Ercan Yildiz and Rebecca Willett for their assistance with the experiments involving dyadic trees. [sent-205, score-0.146]
100 Polonik, “Minimum volume sets and generalized quantile processes,” Stochastic Processes and their Applications, vol. [sent-220, score-0.222]
wordName wordTfidf (topN-words)
[('gg', 0.86), ('dyadic', 0.146), ('mass', 0.14), ('volume', 0.139), ('oracle', 0.114), ('penalty', 0.112), ('risk', 0.078), ('minimum', 0.075), ('vc', 0.072), ('erm', 0.071), ('estimators', 0.065), ('occam', 0.065), ('nowak', 0.065), ('minimization', 0.064), ('inf', 0.06), ('classi', 0.055), ('penalties', 0.053), ('empirical', 0.053), ('estimator', 0.052), ('inequality', 0.05), ('universal', 0.048), ('sets', 0.047), ('partitions', 0.047), ('sample', 0.047), ('consistency', 0.046), ('theorem', 0.045), ('pn', 0.044), ('rademacher', 0.043), ('scott', 0.043), ('cscott', 0.041), ('hush', 0.041), ('polonik', 0.041), ('sidelengths', 0.041), ('erent', 0.041), ('convergence', 0.039), ('excess', 0.037), ('uniform', 0.037), ('vapnik', 0.036), ('lemma', 0.036), ('razor', 0.036), ('quantile', 0.036), ('scovel', 0.036), ('reference', 0.034), ('collection', 0.034), ('countable', 0.032), ('proof', 0.031), ('di', 0.031), ('dence', 0.031), ('damping', 0.03), ('mv', 0.03), ('lugosi', 0.03), ('histograms', 0.03), ('bound', 0.029), ('structural', 0.029), ('rules', 0.029), ('universally', 0.029), ('says', 0.029), ('measure', 0.028), ('union', 0.028), ('recursive', 0.027), ('sup', 0.027), ('class', 0.027), ('cell', 0.027), ('strongly', 0.027), ('bounds', 0.026), ('density', 0.026), ('error', 0.026), ('arg', 0.026), ('stochastic', 0.025), ('let', 0.025), ('missing', 0.024), ('penalized', 0.024), ('inverting', 0.024), ('classes', 0.024), ('cation', 0.024), ('analogy', 0.023), ('minimizer', 0.023), ('pre', 0.023), ('measurable', 0.023), ('tend', 0.023), ('practice', 0.023), ('lagrangian', 0.022), ('tolerance', 0.022), ('grow', 0.022), ('errors', 0.022), ('theoretical', 0.021), ('estimate', 0.021), ('nite', 0.021), ('estimation', 0.021), ('principle', 0.021), ('complexity', 0.02), ('meaningful', 0.02), ('stated', 0.02), ('rapidly', 0.02), ('min', 0.019), ('considers', 0.019), ('summarize', 0.018), ('inspired', 0.018), ('properties', 0.018), ('con', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 112 nips-2005-Learning Minimum Volume Sets
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
2 0.12414719 47 nips-2005-Consistency of one-class SVM and related algorithms
Author: Régis Vert, Jean-philippe Vert
Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator. 1
3 0.078984857 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
4 0.073750295 58 nips-2005-Divergences, surrogate loss functions and experimental design
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: In this paper, we provide a general theorem that establishes a correspondence between surrogate loss functions in classification and the family of f -divergences. Moreover, we provide constructive procedures for determining the f -divergence induced by a given surrogate loss, and conversely for finding all surrogate loss functions that realize a given f -divergence. Next we introduce the notion of universal equivalence among loss functions and corresponding f -divergences, and provide necessary and sufficient conditions for universal equivalence to hold. These ideas have applications to classification problems that also involve a component of experiment design; in particular, we leverage our results to prove consistency of a procedure for learning a classifier under decentralization requirements. 1
5 0.073745273 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
Author: Nicolas Usunier, Massih-reza Amini, Patrick Gallinari
Abstract: In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks. 1
6 0.071523651 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
7 0.069237672 74 nips-2005-Faster Rates in Regression via Active Learning
8 0.06761732 95 nips-2005-Improved risk tail bounds for on-line algorithms
9 0.057114366 41 nips-2005-Coarse sample complexity bounds for active learning
10 0.056194339 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
11 0.048587676 117 nips-2005-Learning from Data of Variable Quality
12 0.044980973 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
13 0.043577932 196 nips-2005-Two view learning: SVM-2K, Theory and Practice
14 0.04238024 50 nips-2005-Convex Neural Networks
15 0.041812744 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
16 0.041611362 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
17 0.040092152 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification
18 0.039762985 78 nips-2005-From Weighted Classification to Policy Search
19 0.039308965 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
20 0.037671894 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
topicId topicWeight
[(0, 0.136), (1, 0.058), (2, -0.015), (3, -0.079), (4, 0.1), (5, 0.089), (6, -0.065), (7, 0.024), (8, -0.081), (9, -0.045), (10, -0.036), (11, 0.06), (12, 0.076), (13, -0.028), (14, 0.054), (15, 0.072), (16, 0.017), (17, -0.078), (18, 0.048), (19, 0.014), (20, 0.048), (21, -0.034), (22, -0.005), (23, 0.004), (24, -0.041), (25, 0.054), (26, -0.032), (27, -0.038), (28, 0.002), (29, 0.03), (30, -0.067), (31, 0.042), (32, 0.081), (33, 0.027), (34, -0.02), (35, 0.098), (36, -0.04), (37, 0.131), (38, 0.038), (39, 0.066), (40, 0.071), (41, 0.033), (42, 0.02), (43, 0.003), (44, -0.022), (45, -0.032), (46, 0.048), (47, -0.008), (48, -0.044), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.92271715 112 nips-2005-Learning Minimum Volume Sets
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
2 0.70501 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
3 0.68241972 47 nips-2005-Consistency of one-class SVM and related algorithms
Author: Régis Vert, Jean-philippe Vert
Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator. 1
4 0.66929245 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
Author: Nicolas Usunier, Massih-reza Amini, Patrick Gallinari
Abstract: In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks. 1
5 0.61963904 117 nips-2005-Learning from Data of Variable Quality
Author: Koby Crammer, Michael Kearns, Jennifer Wortman
Abstract: We initiate the study of learning from multiple sources of limited data, each of which may be corrupted at a different rate. We develop a complete theory of which data sources should be used for two fundamental problems: estimating the bias of a coin, and learning a classifier in the presence of label noise. In both cases, efficient algorithms are provided for computing the optimal subset of data. 1
6 0.61615992 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
7 0.61230576 95 nips-2005-Improved risk tail bounds for on-line algorithms
8 0.55548012 58 nips-2005-Divergences, surrogate loss functions and experimental design
9 0.51506263 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
10 0.48063189 74 nips-2005-Faster Rates in Regression via Active Learning
11 0.47093779 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
12 0.46919149 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
13 0.45846787 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
14 0.41654569 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization
15 0.40907663 205 nips-2005-Worst-Case Bounds for Gaussian Process Models
16 0.40701538 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation
17 0.37554809 196 nips-2005-Two view learning: SVM-2K, Theory and Practice
18 0.37466744 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis
19 0.37197018 41 nips-2005-Coarse sample complexity bounds for active learning
20 0.36810437 19 nips-2005-Active Learning for Misspecified Models
topicId topicWeight
[(3, 0.14), (5, 0.02), (10, 0.041), (27, 0.033), (31, 0.047), (34, 0.079), (37, 0.025), (41, 0.013), (50, 0.011), (55, 0.048), (57, 0.011), (58, 0.128), (69, 0.04), (73, 0.047), (77, 0.011), (80, 0.041), (88, 0.083), (91, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.89647794 112 nips-2005-Learning Minimum Volume Sets
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
2 0.79855067 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
Author: Aurelie C. Lozano, Sanjeev R. Kulkarni, Robert E. Schapire
Abstract: We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classifiers resulting from a regularization achieved by restricting the 1-norm of the base classifiers’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter.
3 0.79182112 74 nips-2005-Faster Rates in Regression via Active Learning
Author: Rebecca Willett, Robert Nowak, Rui M. Castro
Abstract: This paper presents a rigorous statistical analysis characterizing regimes in which active learning significantly outperforms classical passive learning. Active learning algorithms are able to make queries or select sample locations in an online fashion, depending on the results of the previous queries. In some regimes, this extra flexibility leads to significantly faster rates of error decay than those possible in classical passive learning settings. The nature of these regimes is explored by studying fundamental performance limits of active and passive learning in two illustrative nonparametric function classes. In addition to examining the theoretical potential of active learning, this paper describes a practical algorithm capable of exploiting the extra flexibility of the active setting and provably improving upon the classical passive techniques. Our active learning theory and methods show promise in a number of applications, including field estimation using wireless sensor networks and fault line detection. 1
4 0.78570986 41 nips-2005-Coarse sample complexity bounds for active learning
Author: Sanjoy Dasgupta
Abstract: We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the specific target hypothesis, and the desired accuracy.
5 0.77854323 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
6 0.77711827 47 nips-2005-Consistency of one-class SVM and related algorithms
7 0.77461928 177 nips-2005-Size Regularized Cut for Data Clustering
8 0.77127659 159 nips-2005-Q-Clustering
9 0.76787591 117 nips-2005-Learning from Data of Variable Quality
10 0.76449382 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
11 0.76423335 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
12 0.76089275 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
13 0.76007116 58 nips-2005-Divergences, surrogate loss functions and experimental design
14 0.75904089 131 nips-2005-Multiple Instance Boosting for Object Detection
15 0.75850511 84 nips-2005-Generalization in Clustering with Unobserved Features
16 0.75742871 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments
17 0.75498933 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
18 0.75210565 196 nips-2005-Two view learning: SVM-2K, Theory and Practice
19 0.74984992 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
20 0.74945003 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget