jmlr jmlr2009 jmlr2009-5 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gilles Blanchard, Étienne Roquain
Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a specific parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. [sent-7, score-0.532]
2 In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. [sent-8, score-0.54]
3 In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. [sent-12, score-0.453]
4 In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. [sent-13, score-0.612]
5 1 Adaptive Multiple Testing Procedures In this work, we focus on building multiple testing procedures with a control of the false discovery rate (FDR). [sent-23, score-0.389]
6 This quantity is defined as the expected proportion of type I errors, that is, the proportion of true null hypotheses among all the null hypotheses that have been rejected (i. [sent-24, score-0.597]
7 Under completely unspecified dependence, the same authors have shown that the FDR control still holds if the threshold collection of the LSU procedure is divided by a factor 1 + 1/2 + · · · + 1/m, where m is the total number of null hypotheses to test. [sent-29, score-0.449]
8 However, all of these procedures, which are built in order to control the FDR at a level α, can be shown to have actually their FDR upper bounded by π0 α, where π0 is the proportion of true null hypotheses in the initial pool. [sent-31, score-0.347]
9 , Benjamini and Hochberg, 2000; Black, 2004) is to integrate an estimation of the unknown proportion π0 in the threshold of the previous procedures and to prove that the corresponding FDR is still rigorously controlled by α. [sent-37, score-0.33]
10 Of course, adaptive procedures are of practical interest if it is expected that π0 is, or can be, significantly smaller than 1. [sent-38, score-0.272]
11 , Benjamini and Heller, 2007) which first selects some clusters of hypotheses that are likely to contain false nulls, and then apply a multiple testing procedure on the selected hypotheses. [sent-41, score-0.361]
12 Since a large part of the true null hypotheses is expected to be false in the second step, an adaptive procedure is needed in order to keep the FDR close to the target level. [sent-42, score-0.475]
13 A number of adaptive procedures have been proposed in the recent literature and can loosely be divided into the following categories: • plug-in procedures, where some initial estimator of π0 is directly plugged in as a multiplicative level correction to the usual procedures. [sent-43, score-0.345]
14 , Meinshausen and Rice, 2006; Jin and Cai, 2007; Jin, 2008); while their asymptotic consistency (as the number of hypotheses tends to infinity) is generally established, their use in plug-in adaptive procedures has not always been studied. [sent-51, score-0.425]
15 • one-stage procedures, which perform a single round of multiple testing (generally step-up or step-down), based on a particular (deterministic) threshold collection that renders it adaptive (Finner et al. [sent-56, score-0.279]
16 In this framework, the more specific random effects model is—most generally, though not always— considered, in which p-values are assumed independent, each hypothesis has a probability π0 of being true, and all false null hypotheses share the same alternate distribution. [sent-62, score-0.319]
17 The behavior of different procedures is then studied under the limit where the number of tested hypotheses grows to infinity. [sent-63, score-0.316]
18 Under the assumption of positive or arbitrary dependence of the p-values, we introduce new two-stage adaptive versions of known step-up procedures (namely, of the LSU under positive dependence, and of the family of procedures introduced by Blanchard and Fleuret, 2007, under unspecified dependence). [sent-85, score-0.52]
19 These adaptive versions provably control the FDR and result in an improvement of power over the non-adaptive versions in some situations (namely, when the number of hypotheses rejected in the first stage is large, typically more than 60%). [sent-86, score-0.422]
20 A second goal of this work is to present a review of several existing adaptive step-up procedures with provable FDR control under independence. [sent-87, score-0.336]
21 A third goal is to compare both the existing and our new adaptive procedures in an extensive simulation study under both independence and dependence, following the simulation model and methodology used by Benjamini et al. [sent-94, score-0.294]
22 However we also report that the best procedure overall (in terms of power, among procedures that are robust enough to the dependent case) appears to be the plug-in procedure based on the wellknown Storey estimator (Storey, 2002) used with the somewhat nonstandard parameter setting λ = α . [sent-96, score-0.418]
23 • Concerning the new two-stage procedures with theoretical FDR control under dependence, simulations show an (admittedly limited) improvement over their non-adaptive counterparts in favorable situations which correspond to what was expected from the theoretical study (i. [sent-98, score-0.291]
24 The case of positive dependent and arbitrarily dependent p-values is examined in Section 4 where we introduce our new adaptive procedures in this context. [sent-105, score-0.302]
25 Let H be a finite set of null hypotheses for P, that is, each null hypothesis h ∈ H corresponds to some subset of distributions on (X , X) and “P satisfies h” means that P belongs to this subset of distributions. [sent-113, score-0.303]
26 The underlying probability P being fixed, we denote H0 = {h ∈ H | P satisfies h} the set of the true null hypotheses and m0 = |H0 | the number of true null hypotheses. [sent-116, score-0.303]
27 We suppose further that there exists a set of p-value functions p = (ph , h ∈ H ), meaning that each ph : (X , X) → [0, 1] is a measurable function and that for each h ∈ H0 , ph is bounded stochastically by a uniform distribution, that is, ∀h ∈ H0 , ∀t ∈ [0, 1], P [ph ≤ t] ≤ t. [sent-120, score-0.557]
28 As a general rule, provided that FDR(R) ≤ α, we want to build procedures that reject as many false hypotheses as possible. [sent-138, score-0.411]
29 The absolute power of a multiple testing procedure is defined as the average proportion of false hypotheses correctly rejected, E R ∩ H0c / H0c . [sent-139, score-0.437]
30 3 Self-Consistency, Step-Up Procedures, FDR Control and Adaptivity We first define a general class of multiple testing procedures called self-consistent procedures (Blanchard and Roquain, 2008). [sent-147, score-0.414]
31 , m} → R+ , ∆(0) = 0 , be a nondecreasing function called threshold collection; a multiple testing procedure R is said to satisfy the self-consistency condition with respect to ∆ if the inclusion R ⊂ {h ∈ H | ph ≤ ∆(|R|)} holds almost surely. [sent-151, score-0.463]
32 Furthermore, we say that R is nonincreasing if for all h ∈ H the function ph → |R(p)| is nonincreasing, that is if |R| is nonincreasing in each p-value. [sent-152, score-0.458]
33 2842 A DAPTIVE FDR C ONTROL UNDER I NDEPENDENCE AND D EPENDENCE Definition 4 (Step-up procedure) The step-up procedure with threshold collection ∆ is defined as R = {h ∈ H | ph ≤ p(k) }, where k = max{0 ≤ i ≤ m | p(i) ≤ ∆(i)}. [sent-158, score-0.429]
34 Lemma 5 The step-up procedure with threshold collection ∆ is nonincreasing and self-consistent with respect to ∆ . [sent-160, score-0.286]
35 However, to recover procedures of a more general form (including step-down for instance), the statements of this paper will be preferably expressed in terms of self-consistent procedures when it is possible. [sent-167, score-0.35]
36 The LSU plays a prominent role in multiple testing for FDR control; it was the first procedure for which FDR control was proved and it is probably the most widely used procedure in this context. [sent-171, score-0.287]
37 Then any nonincreasing self-consistent procedure with respect to threshold collection ∆(i) = αi/m has FDR upper bounded by π0 α , where π0 = m0 /m is the proportion of true null hypotheses. [sent-174, score-0.428]
38 ) Moreover, if the p-values associated to true null hypotheses are exactly distributed like a uniform distribution, the linear step-up procedure has FDR exactly equal to π0 α . [sent-176, score-0.314]
39 Simply put, the role of adaptive step-up procedures is to mimic the latter oracle in order to obtain more powerful procedures. [sent-193, score-0.325]
40 In the particular case where the shape function β is the identity function on R+ , the procedure is called an adaptive linear step-up procedure using estimator G (and at level α). [sent-196, score-0.381]
41 A subclass of plug-in adaptive procedures is formed by so-called two-stage procedures, when the estimator G is actually based on a first, non-adaptive, multiple testing procedure. [sent-201, score-0.391]
42 The distinction between generic plug-in procedures and two-stage procedures is somewhat informal and generally meant only to provide some kind of nomenclature between different possible approaches. [sent-203, score-0.362]
43 Adaptive Procedures with Provable FDR Control under Independence In this section, we introduce two new adaptive procedures that provably control the FDR under independence. [sent-207, score-0.34]
44 m−i+1 (5) (1−α)i For i ≤ (m + 1)/2, the threshold (5) is α m−i+1 , and thus our approach differs from the threshold collection of the standard LSU procedure threshold by the factor (1−α)m . [sent-228, score-0.295]
45 The latter is a well-known improvement of Bonferroni’s procedure (which corresponds to the fixed threshold α/m), taking into account the proportion of true nulls, and defined as the step-down procedure1 with threshold collection α/(m − i + 1) . [sent-230, score-0.287]
46 The step-down procedure with threshold collection ∆ rejects the hypotheses corresponding to the k smallest p-values, where k = max{0 ≤ i ≤ m | ∀ j ≤ i , p( j) ≤ ∆( j)}. [sent-232, score-0.354]
47 This happens m−i+1 only when the proportion of null hypotheses rejected by the LSU procedure is positive but less than α + 1/m (and even in this region the ratio of the two threshold collections is never less than (1 − α) ). [sent-239, score-0.451]
48 Roughly speaking, this situation with only few rejections can only happen if there are few false hypotheses to begin with (π0 close to 1) or if the false hypotheses are very difficult to detect (the distribution of false p-values is close to being uniform). [sent-240, score-0.539]
49 In the random effect model, hypotheses are assumed to be randomly true or false with probability π0 , and the false null hypotheses share a common distribution P1 . [sent-242, score-0.515]
50 05 0 0 200 400 600 800 1000 Figure 1: For m = 1000 null hypotheses and α = 5%: comparison of the new threshold collection BR-1S-λ given by (4) to that of the LSU, the AORC and FDR09-η . [sent-251, score-0.323]
51 Assume that the true null hypotheses are standard Gaussian with zero mean and that the alternative hypotheses are standard Gaussian with mean −1 µ > 0 . [sent-256, score-0.363]
52 1 − π0 Then if µ > µ∗ , the probability that the LSU rejects a proportion of null hypotheses less than 1/m+α tends to 0 as m tends to infinity. [sent-259, score-0.303]
53 However, the step-up procedure using this threshold collection as is does not have controlled FDR (since t(m) = 1 , the corresponding step-up procedure would always reject all the hypotheses), and several suitable modifications were proposed by Finner et al. [sent-273, score-0.329]
54 2 Adaptive Plug-In Methods In this section, we consider different adaptive step-up procedures of the plug-in type, that is, based on an explicit estimator of π−1 . [sent-289, score-0.327]
55 We also denote p0,h = (p−h , 0) the collection p where ph has been replaced by 0. [sent-299, score-0.292]
56 Consider a nonincreasing multiple testing procedure R which is self-consistent with respect to the adaptive linear threshold collection ∆(p, i) = αG(p)i/m . [sent-302, score-0.447]
57 Finally, the estimator G4 is new and follows exactly the same philosophy as G3 , that is, uses a step-up procedure as a first stage in order to estimate π−1 , but this time based on our adaptive 0 one-stage step-up procedure introduced in the previous section, rather than the standard LSU. [sent-339, score-0.342]
58 To sum up, Corollary 13 states that under independence, for any λ and k0 , the plug-in adaptive procedures Storey-λ, Quant- k0 , BKY06-λ and BR-2S-λ all control the FDR at level α. [sent-348, score-0.336]
59 Namely, assume that we want to reject at least i null hypotheses whenever p(i) is lower than the standard LSU threshold times the estimator G2 wherein k(1−p ) (i) parameter k0 = i is used. [sent-358, score-0.353]
60 In this short section, we present theoretical computations of the FDR for the previously introduced adaptive step-up procedures, under the maximally dependent model where all the p-values are in fact equal, that is ph ≡ p1 for all h ∈ H (and m0 = m). [sent-386, score-0.376]
61 Conversely, the FDR of some adaptive procedures can be higher under moderate dependence than under maximal dependence. [sent-411, score-0.327]
62 Remember the absolute power is defined as the mean proportion of false null hypotheses that are correctly rejected; for each procedure the relative power is the ratio of its absolute power to that of [LSU-Oracle]. [sent-451, score-0.508]
63 Their re1 lation to the remaining procedures depends on the parameters; both [median-LSU] and [FDR09- 2 ] 1 appear to be more powerful than the remaining procedures when π0 > 2 , and less efficient otherwise. [sent-460, score-0.368]
64 Namely, higher p-values are less likely to be “contaminated” by false 2 null hypotheses; conversely, if we take a lower threshold λ, there will be more false null hypotheses included in the set of p-values larger than λ , leading to a pessimistic bias in the estimation of π−1 . [sent-473, score-0.512]
65 However, we do not know from a theoretical point of view if the adaptive procedures have their FDR upper bounded by α. [sent-484, score-0.284]
66 08 0 π0 π0 Figure 2: FDR and power relative to oracle as a function of the true proportion π0 of null hypotheses . [sent-582, score-0.333]
67 These procedures all exhibit good robustness to dependence for FDR control as well as comparatively good power. [sent-705, score-0.294]
68 New Adaptive Procedures with Provable FDR Control under Arbitrary Dependence In this section, we consider from a theoretical point of view the problem of constructing multiple testing procedures that are adaptive to π0 under arbitrary dependence conditions of the p-values. [sent-729, score-0.391]
69 The derivation of adaptive procedures that have provably controlled FDR under dependence appears to have been only studied scarcely (see Sarkar, 2008a, and Farcomeni, 2007). [sent-730, score-0.411]
70 However, we will show that they still provide an improvement, in a certain regime, with respect to the (non-adaptive) LSU procedure in the PRDS case and with respect to the family of (non-adaptive) procedures proposed in Theorem 7 in the arbitrary dependence case. [sent-735, score-0.328]
71 Then, the p-value family p = (ph , h ∈ H ) is said to be positively regression dependent on each one from H0 (PRDS on H0 in short) if for any nondecreasing measurable set D ⊂ [0, 1]H and for all h ∈ H0 , the function u ∈ [0, 1] → P [p ∈ D | ph = u] is nondecreasing. [sent-737, score-0.315]
72 Our first result concerns a two-stage procedure where the first stage R0 is any multiple testing procedure with controlled FWER, and where we (over-) estimate m0 via the straightforward estimator (m − |R0 |) . [sent-742, score-0.358]
73 Theorem 22 Let R0 be a nonincreasing multiple testing procedure and assume that its FWER is / controlled at level α0 , that is, P [R0 ∩ H0 = 0] ≤ α0 . [sent-744, score-0.316]
74 Then the adaptive step-up procedure R with data-dependent threshold collection ∆(i) = α1 (m − |R0 |)−1 β(i) has FDR controlled at level α0 + α1 in either of the following dependence situations: • the p-value family (ph , h ∈ H ) is PRDS on H0 and the shape function is the identity function. [sent-745, score-0.469]
75 If we choose α0 = α1 = α/2 , then this procedure will outperform its 2859 B LANCHARD AND ROQUAIN non-adaptive counterpart (using the same shape function) only if there are more than 50% rejected hypotheses in the first stage. [sent-748, score-0.315]
76 Then the adaptive step-up procedure R with data-dependent threshold collection ∆1 (i) = α1 β(i)Fκ (|R0 |/m)/m has FDR upper bounded by α1 + κα0 in either of the following dependence situations: • the p-value family (ph , h ∈ H ) is PRDS on H0 and the shape function is the identity function. [sent-755, score-0.414]
77 Therefore, here again this adaptive procedure is only useful in the cases where it is expected that a “large” proportion of null hypotheses can easily be rejected. [sent-760, score-0.448]
78 For the FWER-controlled first stage of Theorem 22, we chose a standard Holm procedure (see Holm, 1979), which is a step-down procedure with threshold collection t(i) = αm/(m − i + 1) . [sent-774, score-0.291]
79 In Figure 5, we report the relative power to the oracle LSU, and the False Non-discovery Rate (FNR), which is the converse of the FDR for type II errors, that is, the average of the ratio of non-rejected false hypotheses over the total number of non-rejected hypotheses. [sent-775, score-0.279]
80 Nevertheless, this demonstrates theoretically the possibility of provably adaptive procedures under dependence. [sent-781, score-0.31]
81 Remark 24 Some theoretical results for two-stage procedures under possible dependence using a first stage with controlled FWER or controlled FDR appeared earlier (Farcomeni, 2007). [sent-783, score-0.358]
82 Remark 25 The theoretical problem of adaptive procedures under arbitrary dependence was also considered by Sarkar (2008a) using two-stage procedures. [sent-788, score-0.327]
83 Conclusion and Discussion We proposed several adaptive multiple testing procedures that provably control the FDR under different hypotheses on the dependence of the p-values. [sent-791, score-0.6]
84 The procedure BR-2S is less conservative in general (except in marginal situations) than the adaptive procedure proposed by Benjamini et al. [sent-793, score-0.325]
85 Extensive simulations showed that these new procedures appear to be robustly controlling the FDR even in a positive dependence situation, which is a very desirable property in practice. [sent-795, score-0.287]
86 Secondly, we presented what we think is among the first examples of adaptive multiple testing procedures with provable FDR control in the PRDS case and under unspecified dependence. [sent-824, score-0.4]
87 An important difference with respect to earlier works on this topic is that the procedures we introduced here are both theoretically founded and can be shown to improve over non-adaptive procedures in certain (admittedly limited) circumstances. [sent-825, score-0.366]
88 We first want to underline once more that the theory for adaptive procedures under dependence is still underdeveloped. [sent-834, score-0.327]
89 It might actually be too restrictive to look for procedures having theoretically controlled FDR uniformly over arbitrary dependence situations such as what we studied in Section 4. [sent-835, score-0.311]
90 An interesting future theoretical direction could be to prove that some of the adaptive procedures showing good robustness in our simulations actually have controlled FDR under some types of dependence, at least when the p-values are in some sense not too far from being independent. [sent-836, score-0.378]
91 3 P ROOF OF T HEOREM 11 By definition of self-consistency, the procedure R satisfies R ⊂ {h ∈ H | ph ≤ α|R|G(p)/m}. [sent-866, score-0.328]
92 Now, for any h ∈ H0 \ {h0 } , denote p′ 0 the family p0,h0 , where the variable ph 0,h has been replaced by uh , an independent uniform variable on [0, 1] . [sent-875, score-0.292]
93 Since both ph and uh are independent of the other p-values, ph is stochastically lower bounded by uh and G is nonincreasing, we have E G(p0,h0 ) p−h ≤ E G(p′ 0 ) p−h , 0,h hence also in unconditional expectation. [sent-876, score-0.573]
94 1 + α − (k0 − 1)/m For the BKY06 procedure, we simply remark that since the linear step-up procedure of level λ rejects all the hypotheses when p1 ≤ λ and rejects no hypothesis otherwise, the estimator G1 and G3 are equal in this case. [sent-904, score-0.374]
95 Lemma 26 Assume R is a multiple testing procedure satisfying the self-consistency condition: R ⊂ h ∈ H | ph ≤ αG(p)β(|R|)/m , where G(p) is a data-dependent factor. [sent-909, score-0.392]
96 For the PRDS case, we note that since |R(p)| is coordinate-wise nonincreasing in each p-value, for any v > 0, D = {z ∈ [0, 1]H | |R(z)| < v} is a measurable nondecreasing set, so that the PRDS property implies that u → P [|R| < v | ph = u] is nondecreasing. [sent-916, score-0.367]
97 On the adaptive control of the false discovery rate in multiple testing with independent statistics. [sent-978, score-0.311]
98 Adaptive linear step-up procedures that control the false discovery rate. [sent-990, score-0.325]
99 Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. [sent-1093, score-0.347]
100 Asymptotic properties of false discovery rate controlling procedures under independence. [sent-1097, score-0.297]
wordName wordTfidf (topN-words)
[('fdr', 0.744), ('lsu', 0.253), ('ph', 0.248), ('benjamini', 0.192), ('roquain', 0.177), ('procedures', 0.175), ('hypotheses', 0.141), ('nonincreasing', 0.105), ('storey', 0.098), ('adaptive', 0.097), ('daptive', 0.097), ('ependence', 0.097), ('finner', 0.097), ('prds', 0.097), ('lanchard', 0.091), ('ontrol', 0.091), ('blanchard', 0.091), ('ndependence', 0.082), ('null', 0.081), ('procedure', 0.08), ('false', 0.076), ('conservative', 0.068), ('unspeci', 0.059), ('sarkar', 0.059), ('threshold', 0.057), ('estimator', 0.055), ('dependence', 0.055), ('shape', 0.051), ('proportion', 0.049), ('controlled', 0.049), ('control', 0.046), ('collection', 0.044), ('rejected', 0.043), ('testing', 0.042), ('fwer', 0.041), ('yekutieli', 0.041), ('simulations', 0.039), ('aorc', 0.038), ('genovese', 0.038), ('nulls', 0.038), ('stochastically', 0.037), ('oracle', 0.035), ('roof', 0.033), ('adaptivity', 0.032), ('rejects', 0.032), ('stage', 0.03), ('discovery', 0.028), ('power', 0.027), ('farcomeni', 0.027), ('fnr', 0.027), ('hochberg', 0.023), ('holm', 0.023), ('wasserman', 0.023), ('provably', 0.022), ('independence', 0.022), ('multiple', 0.022), ('alternate', 0.021), ('nonasymptotic', 0.02), ('positively', 0.02), ('reject', 0.019), ('controlling', 0.018), ('robustness', 0.018), ('provable', 0.018), ('powerful', 0.018), ('level', 0.018), ('family', 0.018), ('median', 0.017), ('round', 0.017), ('proved', 0.017), ('accordance', 0.016), ('heorem', 0.016), ('maximally', 0.016), ('remark', 0.016), ('etienne', 0.016), ('fleuret', 0.016), ('gavrilov', 0.016), ('situations', 0.016), ('theoretically', 0.016), ('dependent', 0.015), ('situation', 0.015), ('remember', 0.015), ('favorable', 0.015), ('lemma', 0.015), ('nondecreasing', 0.014), ('theorem', 0.014), ('asymptotical', 0.014), ('capping', 0.014), ('gilles', 0.014), ('rejections', 0.014), ('uh', 0.014), ('appears', 0.013), ('conversely', 0.012), ('quantity', 0.012), ('uniform', 0.012), ('estimators', 0.012), ('bounded', 0.012), ('generally', 0.012), ('concerning', 0.011), ('noticeably', 0.011), ('du', 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence
Author: Gilles Blanchard, Étienne Roquain
Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a specific parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
3 0.096398667 42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks
Author: Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau, René Garcia
Abstract: Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in its two arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of Lipschitz1 functions with these and other properties. We apply this new class of functions to the task of modelling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints. Keywords: neural networks, universal approximation, monotonicity, convexity, call options
4 0.03343071 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks
Author: Jean Hausser, Korbinian Strimmer
Abstract: We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. SpeciÄ?Ĺš cally, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efÄ?Ĺš cient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, James-Stein estimator, “small n, large pâ€? setting, mutual information, gene association network
5 0.023965932 23 jmlr-2009-Discriminative Learning Under Covariate Shift
Author: Steffen Bickel, Michael Brückner, Tobias Scheffer
Abstract: We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection. Keywords: covariate shift, discriminative learning, transfer learning
6 0.022724664 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification
7 0.018826433 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
9 0.017141782 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling
10 0.017004337 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression
11 0.016992619 40 jmlr-2009-Identification of Recurrent Neural Networks by Bayesian Interrogation Techniques
13 0.01504234 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
14 0.014891146 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
15 0.014889049 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification
16 0.014516399 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
17 0.013858038 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
18 0.013789646 29 jmlr-2009-Estimating Labels from Label Proportions
19 0.013679542 30 jmlr-2009-Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods
20 0.01356656 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
topicId topicWeight
[(0, 0.109), (1, -0.093), (2, 0.155), (3, 0.678), (4, 0.159), (5, 0.31), (6, 0.38), (7, -0.022), (8, -0.079), (9, 0.016), (10, 0.048), (11, -0.021), (12, -0.036), (13, -0.0), (14, -0.029), (15, 0.021), (16, 0.01), (17, -0.015), (18, -0.004), (19, 0.008), (20, -0.002), (21, -0.03), (22, -0.039), (23, -0.008), (24, 0.043), (25, 0.032), (26, -0.001), (27, -0.015), (28, -0.03), (29, 0.011), (30, 0.004), (31, -0.012), (32, 0.012), (33, -0.037), (34, 0.01), (35, 0.011), (36, 0.008), (37, 0.021), (38, 0.0), (39, -0.018), (40, 0.018), (41, -0.036), (42, 0.044), (43, -0.003), (44, 0.001), (45, 0.012), (46, 0.042), (47, 0.024), (48, 0.014), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.97566587 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence
Author: Gilles Blanchard, Étienne Roquain
Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a specific parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
3 0.20083658 42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks
Author: Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau, René Garcia
Abstract: Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in its two arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of Lipschitz1 functions with these and other properties. We apply this new class of functions to the task of modelling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints. Keywords: neural networks, universal approximation, monotonicity, convexity, call options
4 0.093595237 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks
Author: Jean Hausser, Korbinian Strimmer
Abstract: We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. SpeciÄ?Ĺš cally, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efÄ?Ĺš cient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, James-Stein estimator, “small n, large pâ€? setting, mutual information, gene association network
5 0.089422323 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling
Author: Dirk Gorissen, Tom Dhaene, Filip De Turck
Abstract: Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visualization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist (Support Vector Machines, Kriging, Neural Networks, etc.) but no type is optimal in all circumstances. Nor is there any hard theory available that can help make this choice. In this paper we present an automatic approach to the model type selection problem. We describe an adaptive global surrogate modeling environment with adaptive sampling, driven by speciated evolution. Different model types are evolved cooperatively using a Genetic Algorithm (heterogeneous evolution) and compete to approximate the iteratively selected data. In this way the optimal model type and complexity for a given data set or simulation code can be dynamically determined. Its utility and performance is demonstrated on a number of problems where it outperforms traditional sequential execution of each model type. Keywords: model type selection, genetic algorithms, global surrogate modeling, function approximation, active learning, adaptive sampling
6 0.072255209 30 jmlr-2009-Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods
7 0.071983777 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
8 0.070918277 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification
9 0.069137193 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
10 0.062382422 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
11 0.062167302 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
12 0.061887473 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
14 0.060701307 29 jmlr-2009-Estimating Labels from Label Proportions
15 0.059428912 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression
16 0.059142992 16 jmlr-2009-Classification with Gaussians and Convex Loss
17 0.058785371 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
18 0.058380295 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)
19 0.05724521 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification
20 0.056189973 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
topicId topicWeight
[(8, 0.037), (11, 0.027), (19, 0.018), (21, 0.012), (26, 0.014), (38, 0.029), (47, 0.022), (52, 0.046), (55, 0.058), (58, 0.023), (66, 0.092), (68, 0.018), (90, 0.047), (96, 0.018), (99, 0.445)]
simIndex simValue paperId paperTitle
same-paper 1 0.66092539 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence
Author: Gilles Blanchard, Étienne Roquain
Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a specific parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values
2 0.6436916 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
Author: Saharon Rosset
Abstract: We show how to follow the path of cross validated solutions to families of regularized optimization problems, defined by a combination of a parameterized loss function and a regularization term. A primary example is kernel quantile regression, where the parameter of the loss function is the quantile being estimated. Even though the bi-level optimization problem we encounter for every quantile is non-convex, the manner in which the optimal cross-validated solution evolves with the parameter of the loss function allows tracking of this solution. We prove this property, construct the resulting algorithm, and demonstrate it on real and artificial data. This algorithm allows us to efficiently solve the whole family of bi-level problems. We show how it can be extended to cover other modeling problems, like support vector regression, and alternative in-sample model selection approaches.1
3 0.2889432 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
Author: Huan Xu, Constantine Caramanis, Shie Mannor
Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
5 0.2818042 9 jmlr-2009-Analysis of Perceptron-Based Active Learning
Author: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni
Abstract: We start by showing that in an active learning setting, the Perceptron algorithm needs Ω( ε12 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit ˜ sphere, we show that our algorithm reaches generalization error ε after asking for just O(d log 1 ) ε labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm. Keywords: active learning, perceptron, label complexity bounds, online learning
6 0.27938083 87 jmlr-2009-Sparse Online Learning via Truncated Gradient
7 0.27588299 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
8 0.27549899 27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting
9 0.27509782 38 jmlr-2009-Hash Kernels for Structured Data
10 0.27487099 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models
11 0.27266845 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression
12 0.27240607 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
13 0.27218702 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression
14 0.27163032 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods
15 0.27100447 48 jmlr-2009-Learning Nondeterministic Classifiers
16 0.27096125 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
17 0.27095667 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications
18 0.27072775 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences
19 0.27067524 18 jmlr-2009-Consistency and Localizability
20 0.27000189 13 jmlr-2009-Bounded Kernel-Based Online Learning