nips nips2005 nips2005-95 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.
Reference: text
sentIndex sentText sentNum sentScore
1 it Abstract We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. [sent-5, score-1.194]
2 Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments. [sent-6, score-0.455]
3 1 Introduction In this paper, we analyze the risk of hypotheses selected from the ensemble obtained by running an arbitrary on-line learning algorithm on an i. [sent-7, score-0.945]
4 We describe a procedure that selects from the ensemble a hypothesis whose risk is , with high probability, at most Mn + 0 ((inn + J~n Inn) , n)2 where Mn is the average cumulative loss incurred by the on-line algorithm on a training sequence of length n. [sent-11, score-1.067]
5 Note that this bound exhibits the "fast" rate (in n)2 I n whenever the cumulative loss nMn is 0(1). [sent-12, score-0.328]
6 This result is proven through a refinement of techniques that we used in [2] to prove the substantially weaker bound Mn + 0 ( (in n) I n). [sent-13, score-0.261]
7 As in the proof of the older result , we analyze the empirical process associated with a run of the on-line learner using exponential inequalities for martingales. [sent-14, score-0.161]
8 However, this time we control the large deviations of the on-line process using Bernstein's maximal inequality rather than the Azuma-Hoeffding inequality. [sent-15, score-0.145]
9 This provides a much tighter bound on the average risk of the ensemble. [sent-16, score-0.597]
10 Finally, we relate the risk of a specific hypothesis within the ensemble to the average risk. [sent-17, score-0.955]
11 As in [2], we select this hypothesis using a deterministic sequential testing procedure, but the use of Bernstein's inequality makes the analysis of this procedure far more complicated. [sent-18, score-0.293]
12 Unlike uniform convergence, which is tailored to empirical risk minimization , our bounds hold for any learning algorithm. [sent-23, score-0.652]
13 Indeed , disregarding efficiency issues, any learner can be run incrementally on a data sequence to generate an ensemble of hypotheses. [sent-24, score-0.543]
14 Instances x are tuples of numerical and/or symbolic attributes. [sent-28, score-0.065]
15 Labels y belong to a finite set of symbols (the class elements) or to an interval of the real line, depending on whether the task is classification or regression. [sent-29, score-0.095]
16 We allow a learning algorithm to output hypotheses of the form h : X ----> D , where D is a decision space not necessarily equal to y. [sent-30, score-0.175]
17 The goodness of hypothesis h on example (x, y) is measured by the quantity C(h(x), y), where C : D x Y ----> lR. [sent-31, score-0.19]
18 2 A bound on the average risk An on-line algorithm A works in a sequence of trials. [sent-33, score-0.641]
19 the algorithm takes in input a hypothesis H t - l and an example Zt = (Xt, yt), and returns a new hypothesis H t to be used in the next trial. [sent-37, score-0.362]
20 We follow the standard assumptions in statistical learning: the sequence of examples zn = ((Xl , Yd , . [sent-38, score-0.109]
21 We also assume that the loss function C satisfies 0 ::; C ::; 1. [sent-45, score-0.149]
22 The success of a hypothesis h is measured by the risk of h, denoted by risk(h). [sent-46, score-0.588]
23 This is the expected loss of h on an example (X, Y) drawn from the underlying distribution , risk(h) = lEC(h(X), Y). [sent-47, score-0.095]
24 Define also riskernp(h) to be the empirical risk of h on a sample zn, 1 riskernp(h) = n n 2: C(h(X t ), yt) . [sent-48, score-0.503]
25 t =l Given a sample zn and an on-line algorithm A, we use Ho, HI, . [sent-49, score-0.095]
26 ,Hn - l to denote the ensemble of hypotheses generated by A. [sent-52, score-0.51]
27 Note that the ensemble is a function of the random training sample zn. [sent-53, score-0.336]
28 Our bounds hinge on the sample statistic which can be easily computed as the on-line algorithm is run on zn. [sent-54, score-0.151]
29 The following bound, a consequence of Bernstein's maximal inequality for martingales due to Freedman [4], is of primary importance for proving our results. [sent-55, score-0.217]
30 be a sequence of random variables, 0 ::; L t ::; 1. [sent-59, score-0.044]
31 Define the bounded martingale difference sequence Vi = lE[Lt ILl' . [sent-60, score-0.203]
32 ' L t - l ] - L t and the associated martingale Sn = VI + . [sent-63, score-0.125]
33 The next proposition, derived from Lemma 1, establishes a bound on the average risk of the ensemble of hypotheses. [sent-71, score-0.903]
34 ,Hn - 1 be the ensemble of hypotheses generated by an arbitrary on-line algorithm A. [sent-75, score-0.51]
35 36 IP' ( ;:;: ~ rlsk(Ht - d ::::: Mn + ~ ~ (nMn +3 ) In 5 +2 The bound shown in Proposition 2 has the same rate as a bound recently proven by Zhang [6, Theorem 5]. [sent-77, score-0.397]
36 However, rather than deriving the bound from Bernstein inequality as we do , Zhang uses an ad hoc argument. [sent-78, score-0.299]
37 Let 1 n 2: risk(Ht _d f-ln = - n and vt - l = risk(Ht _d - C(Ht-1(Xt ), yt) for t ::::: l. [sent-80, score-0.141]
38 Also, set for brevity K n = 2:~= 1 "'t, K~ = l2:~= 1 "'d, and introduce the function A (x) 2 In (X+l)}X +3) for x ::::: O. [sent-85, score-0.061]
39 We find upper and lower bounds on the probability IP' (t vt - l ::::: A(Kn) +J A(Kn) Kn) . [sent-86, score-0.32]
40 (1) The upper bound is determined through a simple stratification argument over Lemma 1. [sent-87, score-0.255]
41 s=o (s + l)(s + 3) (2) As far as the lower bound on (1) is concerned, we note that our assumption 0 ::; C ::; 1 implies "'t ::; risk(Ht_d for all t which, in tum, gives Kn ::; nf-ln. [sent-90, score-0.17]
42 Thus (1) = IP' ( nf-ln - nMn ::::: A(Kn) + J A(Kn) Kn) ::::: IP' ( nf-ln - nMn ::::: A(nf-ln) + J A(nf-ln) nf-ln) = IP' ( 2nf-ln ::::: 2nMn + 3A(nf-ln) + J4nMn A(nf-ln) + 5A(nf-lnF ) = lP' ( x::::: B+ ~A(x) + JB A(x) + ~A2(x) ) , where we set for brevity x = nf-ln and B = n Mn. [sent-91, score-0.061]
43 We would like to solve the inequality x ~ B+ ~A(x) + VB A(x) + ~A2(X) (3) W. [sent-92, score-0.101]
44 More precisely, we would like to find a suitable upper bound on the (unique) x* such that the above is satisfied as an equality. [sent-96, score-0.263]
45 A (tedious) derivative argument along with the upper bound A(x) ::; 4 In x' = (X! [sent-97, score-0.255]
46 Thus x' is an upper bound on x* , and we conclude that which, recalling the definitions of x and B, and combining with (2), proves the bound. [sent-99, score-0.248]
47 D 3 Selecting a good hypothesis from the ensemble If the decision space D of A is a convex set and the loss function ÂŁ is convex in its first argument, then via Jensen's inequality we can directly apply the bound of Proposition 2 to the risk of the average hypothesis H = ~ L ~=I H t - I . [sent-100, score-1.392]
48 (4) Observe that this is a O(l/n) bound whenever the cumulative loss n Mn is 0(1). [sent-102, score-0.328]
49 If the convexity hypotheses do not hold (as in the case of classification problems), then the bound in (4) applies to a hypothesis randomly drawn from the ensemble (this was investigated in [1] though with different goals). [sent-103, score-0.936]
50 In this section we show how to deterministically pick from the ensemble a hypothesis whose risk is close to the average ensemble risk. [sent-104, score-1.2]
51 r5 Let riskemp(Ht , t + 1) hypothesis H t , where + Er5 (riskemp(Ht, t + 1), t) 1 riskemp(Ht , t be the penalized empirical risk of n + 1) = --- " ÂŁ(Ht(Xi), Xi) n t ~ i=t+1 is the empirical risk of H t on the remaining sample Zt+ l, . [sent-107, score-1.186]
52 , Z]1' We now analyze the performance of the learning algorithm that returns the hypothesis H minimizing the penalized risk estimate over all hypotheses in the ensemble, i. [sent-110, score-0.889]
53 (5) O::; t < b S 1, the hypothesis H satisfies lP' (risk(H) > min (risk(Ht ) O::; t < O. [sent-113, score-0.244]
54 D Conclusions and current research issues We have shown tail risk bounds for specific hypotheses selected from the ensemble generated by the run of an arbitrary on-line algorithm. [sent-119, score-1.224]
55 Proposition 2, our simplest bound, is proven via an easy application of Bernstein's maximal inequality for martingales, a quite basic result in probability theory. [sent-120, score-0.202]
56 The analysis of Theorem 4 is also centered on the same martingale inequality. [sent-121, score-0.125]
57 Also, the bound shown in Theorem 4 contains In n terms. [sent-123, score-0.17]
58 A further open problem is to prove lower bounds, even in the special case when n Mn is bounded by a constant. [sent-125, score-0.068]
wordName wordTfidf (topN-words)
[('risk', 0.427), ('kn', 0.306), ('ensemble', 0.306), ('ht', 0.259), ('riskemp', 0.208), ('ip', 0.195), ('mn', 0.193), ('hypotheses', 0.175), ('bound', 0.17), ('nmn', 0.166), ('hypothesis', 0.161), ('bernstein', 0.145), ('vt', 0.141), ('martingale', 0.125), ('proposition', 0.102), ('inequality', 0.101), ('bounds', 0.086), ('italy', 0.083), ('milano', 0.083), ('riskernp', 0.083), ('satisfies', 0.083), ('universita', 0.083), ('mt', 0.076), ('lp', 0.076), ('zt', 0.073), ('tail', 0.073), ('inn', 0.072), ('martingales', 0.072), ('freedman', 0.072), ('loss', 0.066), ('classification', 0.066), ('gentile', 0.066), ('vb', 0.066), ('zn', 0.065), ('yt', 0.064), ('cumulative', 0.063), ('colt', 0.063), ('brevity', 0.061), ('specific', 0.061), ('lemma', 0.058), ('define', 0.058), ('proven', 0.057), ('incrementally', 0.053), ('find', 0.051), ('nm', 0.049), ('penalized', 0.049), ('empirical', 0.046), ('sequence', 0.044), ('var', 0.044), ('maximal', 0.044), ('learner', 0.043), ('rt', 0.043), ('argument', 0.043), ('upper', 0.042), ('xt', 0.041), ('sn', 0.04), ('returns', 0.04), ('analyze', 0.037), ('ja', 0.036), ('beating', 0.036), ('recalling', 0.036), ('tailored', 0.036), ('talk', 0.036), ('tuples', 0.036), ('va', 0.036), ('zl', 0.036), ('vi', 0.036), ('run', 0.035), ('bounded', 0.034), ('prove', 0.034), ('zhang', 0.033), ('ill', 0.033), ('jb', 0.033), ('ha', 0.033), ('devroye', 0.033), ('conconi', 0.033), ('littlestone', 0.033), ('issues', 0.032), ('sequential', 0.031), ('initiated', 0.031), ('sharply', 0.031), ('kalai', 0.031), ('disregarding', 0.031), ('lt', 0.031), ('efficiency', 0.031), ('ho', 0.031), ('sample', 0.03), ('generated', 0.029), ('drawn', 0.029), ('whenever', 0.029), ('hold', 0.029), ('symbolic', 0.029), ('goodness', 0.029), ('tedious', 0.029), ('finite', 0.029), ('uniform', 0.028), ('verlag', 0.028), ('kt', 0.028), ('vx', 0.028), ('hoc', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 95 nips-2005-Improved risk tail bounds for on-line algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.
2 0.14814641 54 nips-2005-Data-Driven Online to Batch Conversions
Author: Ofer Dekel, Yoram Singer
Abstract: Online learning algorithms are typically fast, memory efficient, and simple to implement. However, many common learning problems fit more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing online algorithm. We first give a unified overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions. Our conversions find hypotheses with a small risk by explicitly minimizing datadependent generalization bounds. We experimentally demonstrate the usefulness of our approach and in particular show that the data-driven conversions consistently outperform the data-independent conversions.
3 0.14058192 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
Author: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis
Abstract: We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the ℓ1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weights. Mirror descent algorithms have been developed in different contexts and they are known to be particularly efficient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error. 1
4 0.13680424 41 nips-2005-Coarse sample complexity bounds for active learning
Author: Sanjoy Dasgupta
Abstract: We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the specific target hypothesis, and the desired accuracy.
5 0.13420558 47 nips-2005-Consistency of one-class SVM and related algorithms
Author: Régis Vert, Jean-philippe Vert
Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator. 1
6 0.1281738 58 nips-2005-Divergences, surrogate loss functions and experimental design
7 0.11043728 85 nips-2005-Generalization to Unseen Cases
8 0.10845879 70 nips-2005-Fast Information Value for Graphical Models
9 0.10708802 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
10 0.10580397 50 nips-2005-Convex Neural Networks
11 0.097612403 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
12 0.091390952 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models
13 0.086588688 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
14 0.074194163 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
15 0.074056193 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
16 0.067679338 45 nips-2005-Conditional Visual Tracking in Kernel Space
17 0.06761732 112 nips-2005-Learning Minimum Volume Sets
18 0.067271158 117 nips-2005-Learning from Data of Variable Quality
19 0.063796893 76 nips-2005-From Batch to Transductive Online Learning
20 0.062529102 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
topicId topicWeight
[(0, 0.172), (1, 0.088), (2, 0.01), (3, -0.134), (4, 0.187), (5, 0.123), (6, -0.09), (7, 0.042), (8, -0.185), (9, -0.029), (10, -0.093), (11, 0.216), (12, -0.046), (13, -0.079), (14, 0.055), (15, -0.048), (16, -0.03), (17, -0.155), (18, -0.086), (19, -0.016), (20, -0.089), (21, 0.006), (22, 0.003), (23, -0.092), (24, 0.107), (25, 0.059), (26, -0.156), (27, 0.098), (28, -0.039), (29, 0.037), (30, -0.045), (31, 0.016), (32, -0.058), (33, -0.046), (34, -0.015), (35, 0.08), (36, -0.0), (37, -0.039), (38, 0.027), (39, 0.046), (40, 0.068), (41, 0.001), (42, 0.159), (43, 0.025), (44, -0.036), (45, -0.016), (46, -0.0), (47, 0.017), (48, 0.057), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.97506082 95 nips-2005-Improved risk tail bounds for on-line algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.
2 0.66866606 54 nips-2005-Data-Driven Online to Batch Conversions
Author: Ofer Dekel, Yoram Singer
Abstract: Online learning algorithms are typically fast, memory efficient, and simple to implement. However, many common learning problems fit more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing online algorithm. We first give a unified overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions. Our conversions find hypotheses with a small risk by explicitly minimizing datadependent generalization bounds. We experimentally demonstrate the usefulness of our approach and in particular show that the data-driven conversions consistently outperform the data-independent conversions.
3 0.65704143 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
Author: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis
Abstract: We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the ℓ1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weights. Mirror descent algorithms have been developed in different contexts and they are known to be particularly efficient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error. 1
4 0.5591656 76 nips-2005-From Batch to Transductive Online Learning
Author: Sham Kakade, Adam Tauman Kalai
Abstract: It is well-known that everything that is learnable in the difficult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. We give an efficient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efficiently learnable in a batch model and a transductive online model.
5 0.55616862 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer
Abstract: The Perceptron algorithm, despite its simplicity, often performs well on online classification tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difficulty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithm for kernel-based online learning on a fixed memory budget. To our knowledge, this is the first online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.
6 0.51338965 112 nips-2005-Learning Minimum Volume Sets
7 0.49500415 58 nips-2005-Divergences, surrogate loss functions and experimental design
8 0.45440522 85 nips-2005-Generalization to Unseen Cases
9 0.45224592 47 nips-2005-Consistency of one-class SVM and related algorithms
10 0.4260723 117 nips-2005-Learning from Data of Variable Quality
11 0.38890597 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
12 0.36769679 50 nips-2005-Convex Neural Networks
13 0.36739725 41 nips-2005-Coarse sample complexity bounds for active learning
14 0.36328653 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
15 0.36145884 70 nips-2005-Fast Information Value for Graphical Models
16 0.35585144 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
17 0.32695729 205 nips-2005-Worst-Case Bounds for Gaussian Process Models
18 0.30795947 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
19 0.30788651 160 nips-2005-Query by Committee Made Real
20 0.3055585 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
topicId topicWeight
[(3, 0.088), (10, 0.035), (22, 0.358), (27, 0.014), (31, 0.041), (34, 0.094), (55, 0.036), (69, 0.051), (73, 0.026), (88, 0.073), (91, 0.095)]
simIndex simValue paperId paperTitle
1 0.81234825 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers
Author: Juan J. Murillo-fuentes, Sebastian Caro, Fernando Pérez-Cruz
Abstract: In this paper we propose a new receiver for digital communications. We focus on the application of Gaussian Processes (GPs) to the multiuser detection (MUD) in code division multiple access (CDMA) systems to solve the near-far problem. Hence, we aim to reduce the interference from other users sharing the same frequency band. While usual approaches minimize the mean square error (MMSE) to linearly retrieve the user of interest, we exploit the same criteria but in the design of a nonlinear MUD. Since the optimal solution is known to be nonlinear, the performance of this novel method clearly improves that of the MMSE detectors. Furthermore, the GP based MUD achieves excellent interference suppression even for short training sequences. We also include some experiments to illustrate that other nonlinear detectors such as those based on Support Vector Machines (SVMs) exhibit a worse performance. 1
same-paper 2 0.79978144 95 nips-2005-Improved risk tail bounds for on-line algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.
3 0.63648856 160 nips-2005-Query by Committee Made Real
Author: Ran Gilad-bachrach, Amir Navot, Naftali Tishby
Abstract: Training a learning algorithm is a costly task. A major goal of active learning is to reduce this cost. In this paper we introduce a new algorithm, KQBC, which is capable of actively learning large scale problems by using selective sampling. The algorithm overcomes the costly sampling step of the well known Query By Committee (QBC) algorithm by projecting onto a low dimensional space. KQBC also enables the use of kernels, providing a simple way of extending QBC to the non-linear scenario. Sampling the low dimension space is done using the hit and run random walk. We demonstrate the success of this novel algorithm by applying it to both artificial and a real world problems.
4 0.47882727 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
Author: Rory Sayres, David Ress, Kalanit Grill-spector
Abstract: The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex [1]. It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data. 1
5 0.45601606 112 nips-2005-Learning Minimum Volume Sets
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
6 0.45121416 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
7 0.44941932 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
8 0.44929928 58 nips-2005-Divergences, surrogate loss functions and experimental design
9 0.44729266 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
10 0.44694021 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
11 0.4451198 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments
12 0.4448851 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
13 0.44422624 47 nips-2005-Consistency of one-class SVM and related algorithms
14 0.44386661 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
15 0.44217977 50 nips-2005-Convex Neural Networks
16 0.44096333 41 nips-2005-Coarse sample complexity bounds for active learning
17 0.44088176 54 nips-2005-Data-Driven Online to Batch Conversions
18 0.43949685 74 nips-2005-Faster Rates in Regression via Active Learning
19 0.43706414 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
20 0.43671018 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification