jmlr jmlr2009 jmlr2009-5 knowledge-graph by maker-knowledge-mining

5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence

Source: pdf

Author: Gilles Blanchard, Étienne Roquain

Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efﬁcency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a ﬁrst part, assuming independence of the p-values, we present two new procedures and give a uniﬁed review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a speciﬁc parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspeciﬁed dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the ﬁrst theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. [sent-7, score-0.532]

2 In a ﬁrst part, assuming independence of the p-values, we present two new procedures and give a uniﬁed review of other existing adaptive procedures that have provably controlled FDR. [sent-8, score-0.54]

3 In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspeciﬁed dependence of the p-values, respectively. [sent-12, score-0.453]

4 In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the ﬁrst theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. [sent-13, score-0.612]

5 1 Adaptive Multiple Testing Procedures In this work, we focus on building multiple testing procedures with a control of the false discovery rate (FDR). [sent-23, score-0.389]

6 This quantity is deﬁned as the expected proportion of type I errors, that is, the proportion of true null hypotheses among all the null hypotheses that have been rejected (i. [sent-24, score-0.597]

7 Under completely unspeciﬁed dependence, the same authors have shown that the FDR control still holds if the threshold collection of the LSU procedure is divided by a factor 1 + 1/2 + · · · + 1/m, where m is the total number of null hypotheses to test. [sent-29, score-0.449]

8 However, all of these procedures, which are built in order to control the FDR at a level α, can be shown to have actually their FDR upper bounded by π0 α, where π0 is the proportion of true null hypotheses in the initial pool. [sent-31, score-0.347]

9 , Benjamini and Hochberg, 2000; Black, 2004) is to integrate an estimation of the unknown proportion π0 in the threshold of the previous procedures and to prove that the corresponding FDR is still rigorously controlled by α. [sent-37, score-0.33]

10 Of course, adaptive procedures are of practical interest if it is expected that π0 is, or can be, signiﬁcantly smaller than 1. [sent-38, score-0.272]

11 , Benjamini and Heller, 2007) which ﬁrst selects some clusters of hypotheses that are likely to contain false nulls, and then apply a multiple testing procedure on the selected hypotheses. [sent-41, score-0.361]

12 Since a large part of the true null hypotheses is expected to be false in the second step, an adaptive procedure is needed in order to keep the FDR close to the target level. [sent-42, score-0.475]

13 A number of adaptive procedures have been proposed in the recent literature and can loosely be divided into the following categories: • plug-in procedures, where some initial estimator of π0 is directly plugged in as a multiplicative level correction to the usual procedures. [sent-43, score-0.345]

14 , Meinshausen and Rice, 2006; Jin and Cai, 2007; Jin, 2008); while their asymptotic consistency (as the number of hypotheses tends to inﬁnity) is generally established, their use in plug-in adaptive procedures has not always been studied. [sent-51, score-0.425]

15 • one-stage procedures, which perform a single round of multiple testing (generally step-up or step-down), based on a particular (deterministic) threshold collection that renders it adaptive (Finner et al. [sent-56, score-0.279]

16 In this framework, the more speciﬁc random effects model is—most generally, though not always— considered, in which p-values are assumed independent, each hypothesis has a probability π0 of being true, and all false null hypotheses share the same alternate distribution. [sent-62, score-0.319]

17 The behavior of different procedures is then studied under the limit where the number of tested hypotheses grows to inﬁnity. [sent-63, score-0.316]

18 Under the assumption of positive or arbitrary dependence of the p-values, we introduce new two-stage adaptive versions of known step-up procedures (namely, of the LSU under positive dependence, and of the family of procedures introduced by Blanchard and Fleuret, 2007, under unspeciﬁed dependence). [sent-85, score-0.52]

19 These adaptive versions provably control the FDR and result in an improvement of power over the non-adaptive versions in some situations (namely, when the number of hypotheses rejected in the ﬁrst stage is large, typically more than 60%). [sent-86, score-0.422]

20 A second goal of this work is to present a review of several existing adaptive step-up procedures with provable FDR control under independence. [sent-87, score-0.336]

21 A third goal is to compare both the existing and our new adaptive procedures in an extensive simulation study under both independence and dependence, following the simulation model and methodology used by Benjamini et al. [sent-94, score-0.294]

22 However we also report that the best procedure overall (in terms of power, among procedures that are robust enough to the dependent case) appears to be the plug-in procedure based on the wellknown Storey estimator (Storey, 2002) used with the somewhat nonstandard parameter setting λ = α . [sent-96, score-0.418]

23 • Concerning the new two-stage procedures with theoretical FDR control under dependence, simulations show an (admittedly limited) improvement over their non-adaptive counterparts in favorable situations which correspond to what was expected from the theoretical study (i. [sent-98, score-0.291]

24 The case of positive dependent and arbitrarily dependent p-values is examined in Section 4 where we introduce our new adaptive procedures in this context. [sent-105, score-0.302]

25 Let H be a ﬁnite set of null hypotheses for P, that is, each null hypothesis h ∈ H corresponds to some subset of distributions on (X , X) and “P satisﬁes h” means that P belongs to this subset of distributions. [sent-113, score-0.303]

26 The underlying probability P being ﬁxed, we denote H0 = {h ∈ H | P satisﬁes h} the set of the true null hypotheses and m0 = |H0 | the number of true null hypotheses. [sent-116, score-0.303]

27 We suppose further that there exists a set of p-value functions p = (ph , h ∈ H ), meaning that each ph : (X , X) → [0, 1] is a measurable function and that for each h ∈ H0 , ph is bounded stochastically by a uniform distribution, that is, ∀h ∈ H0 , ∀t ∈ [0, 1], P [ph ≤ t] ≤ t. [sent-120, score-0.557]

28 As a general rule, provided that FDR(R) ≤ α, we want to build procedures that reject as many false hypotheses as possible. [sent-138, score-0.411]

29 The absolute power of a multiple testing procedure is deﬁned as the average proportion of false hypotheses correctly rejected, E R ∩ H0c / H0c . [sent-139, score-0.437]

30 3 Self-Consistency, Step-Up Procedures, FDR Control and Adaptivity We ﬁrst deﬁne a general class of multiple testing procedures called self-consistent procedures (Blanchard and Roquain, 2008). [sent-147, score-0.414]

31 , m} → R+ , ∆(0) = 0 , be a nondecreasing function called threshold collection; a multiple testing procedure R is said to satisfy the self-consistency condition with respect to ∆ if the inclusion R ⊂ {h ∈ H | ph ≤ ∆(|R|)} holds almost surely. [sent-151, score-0.463]

32 Furthermore, we say that R is nonincreasing if for all h ∈ H the function ph → |R(p)| is nonincreasing, that is if |R| is nonincreasing in each p-value. [sent-152, score-0.458]

33 2842 A DAPTIVE FDR C ONTROL UNDER I NDEPENDENCE AND D EPENDENCE Deﬁnition 4 (Step-up procedure) The step-up procedure with threshold collection ∆ is deﬁned as R = {h ∈ H | ph ≤ p(k) }, where k = max{0 ≤ i ≤ m | p(i) ≤ ∆(i)}. [sent-158, score-0.429]

34 Lemma 5 The step-up procedure with threshold collection ∆ is nonincreasing and self-consistent with respect to ∆ . [sent-160, score-0.286]

35 However, to recover procedures of a more general form (including step-down for instance), the statements of this paper will be preferably expressed in terms of self-consistent procedures when it is possible. [sent-167, score-0.35]

36 The LSU plays a prominent role in multiple testing for FDR control; it was the ﬁrst procedure for which FDR control was proved and it is probably the most widely used procedure in this context. [sent-171, score-0.287]

37 Then any nonincreasing self-consistent procedure with respect to threshold collection ∆(i) = αi/m has FDR upper bounded by π0 α , where π0 = m0 /m is the proportion of true null hypotheses. [sent-174, score-0.428]

38 ) Moreover, if the p-values associated to true null hypotheses are exactly distributed like a uniform distribution, the linear step-up procedure has FDR exactly equal to π0 α . [sent-176, score-0.314]

39 Simply put, the role of adaptive step-up procedures is to mimic the latter oracle in order to obtain more powerful procedures. [sent-193, score-0.325]

40 In the particular case where the shape function β is the identity function on R+ , the procedure is called an adaptive linear step-up procedure using estimator G (and at level α). [sent-196, score-0.381]

41 A subclass of plug-in adaptive procedures is formed by so-called two-stage procedures, when the estimator G is actually based on a ﬁrst, non-adaptive, multiple testing procedure. [sent-201, score-0.391]

42 The distinction between generic plug-in procedures and two-stage procedures is somewhat informal and generally meant only to provide some kind of nomenclature between different possible approaches. [sent-203, score-0.362]

43 Adaptive Procedures with Provable FDR Control under Independence In this section, we introduce two new adaptive procedures that provably control the FDR under independence. [sent-207, score-0.34]

44 m−i+1 (5) (1−α)i For i ≤ (m + 1)/2, the threshold (5) is α m−i+1 , and thus our approach differs from the threshold collection of the standard LSU procedure threshold by the factor (1−α)m . [sent-228, score-0.295]

45 The latter is a well-known improvement of Bonferroni’s procedure (which corresponds to the ﬁxed threshold α/m), taking into account the proportion of true nulls, and deﬁned as the step-down procedure1 with threshold collection α/(m − i + 1) . [sent-230, score-0.287]

46 The step-down procedure with threshold collection ∆ rejects the hypotheses corresponding to the k smallest p-values, where k = max{0 ≤ i ≤ m | ∀ j ≤ i , p( j) ≤ ∆( j)}. [sent-232, score-0.354]

47 This happens m−i+1 only when the proportion of null hypotheses rejected by the LSU procedure is positive but less than α + 1/m (and even in this region the ratio of the two threshold collections is never less than (1 − α) ). [sent-239, score-0.451]

48 Roughly speaking, this situation with only few rejections can only happen if there are few false hypotheses to begin with (π0 close to 1) or if the false hypotheses are very difﬁcult to detect (the distribution of false p-values is close to being uniform). [sent-240, score-0.539]

49 In the random effect model, hypotheses are assumed to be randomly true or false with probability π0 , and the false null hypotheses share a common distribution P1 . [sent-242, score-0.515]

50 05 0 0 200 400 600 800 1000 Figure 1: For m = 1000 null hypotheses and α = 5%: comparison of the new threshold collection BR-1S-λ given by (4) to that of the LSU, the AORC and FDR09-η . [sent-251, score-0.323]

51 Assume that the true null hypotheses are standard Gaussian with zero mean and that the alternative hypotheses are standard Gaussian with mean −1 µ > 0 . [sent-256, score-0.363]

52 1 − π0 Then if µ > µ∗ , the probability that the LSU rejects a proportion of null hypotheses less than 1/m+α tends to 0 as m tends to inﬁnity. [sent-259, score-0.303]

53 However, the step-up procedure using this threshold collection as is does not have controlled FDR (since t(m) = 1 , the corresponding step-up procedure would always reject all the hypotheses), and several suitable modiﬁcations were proposed by Finner et al. [sent-273, score-0.329]

54 2 Adaptive Plug-In Methods In this section, we consider different adaptive step-up procedures of the plug-in type, that is, based on an explicit estimator of π−1 . [sent-289, score-0.327]

55 We also denote p0,h = (p−h , 0) the collection p where ph has been replaced by 0. [sent-299, score-0.292]

56 Consider a nonincreasing multiple testing procedure R which is self-consistent with respect to the adaptive linear threshold collection ∆(p, i) = αG(p)i/m . [sent-302, score-0.447]

57 Finally, the estimator G4 is new and follows exactly the same philosophy as G3 , that is, uses a step-up procedure as a ﬁrst stage in order to estimate π−1 , but this time based on our adaptive 0 one-stage step-up procedure introduced in the previous section, rather than the standard LSU. [sent-339, score-0.342]

58 To sum up, Corollary 13 states that under independence, for any λ and k0 , the plug-in adaptive procedures Storey-λ, Quant- k0 , BKY06-λ and BR-2S-λ all control the FDR at level α. [sent-348, score-0.336]

59 Namely, assume that we want to reject at least i null hypotheses whenever p(i) is lower than the standard LSU threshold times the estimator G2 wherein k(1−p ) (i) parameter k0 = i is used. [sent-358, score-0.353]

60 In this short section, we present theoretical computations of the FDR for the previously introduced adaptive step-up procedures, under the maximally dependent model where all the p-values are in fact equal, that is ph ≡ p1 for all h ∈ H (and m0 = m). [sent-386, score-0.376]

61 Conversely, the FDR of some adaptive procedures can be higher under moderate dependence than under maximal dependence. [sent-411, score-0.327]

62 Remember the absolute power is deﬁned as the mean proportion of false null hypotheses that are correctly rejected; for each procedure the relative power is the ratio of its absolute power to that of [LSU-Oracle]. [sent-451, score-0.508]

63 Their re1 lation to the remaining procedures depends on the parameters; both [median-LSU] and [FDR09- 2 ] 1 appear to be more powerful than the remaining procedures when π0 > 2 , and less efﬁcient otherwise. [sent-460, score-0.368]

64 Namely, higher p-values are less likely to be “contaminated” by false 2 null hypotheses; conversely, if we take a lower threshold λ, there will be more false null hypotheses included in the set of p-values larger than λ , leading to a pessimistic bias in the estimation of π−1 . [sent-473, score-0.512]

65 However, we do not know from a theoretical point of view if the adaptive procedures have their FDR upper bounded by α. [sent-484, score-0.284]

66 08 0 π0 π0 Figure 2: FDR and power relative to oracle as a function of the true proportion π0 of null hypotheses . [sent-582, score-0.333]

67 These procedures all exhibit good robustness to dependence for FDR control as well as comparatively good power. [sent-705, score-0.294]

68 New Adaptive Procedures with Provable FDR Control under Arbitrary Dependence In this section, we consider from a theoretical point of view the problem of constructing multiple testing procedures that are adaptive to π0 under arbitrary dependence conditions of the p-values. [sent-729, score-0.391]

69 The derivation of adaptive procedures that have provably controlled FDR under dependence appears to have been only studied scarcely (see Sarkar, 2008a, and Farcomeni, 2007). [sent-730, score-0.411]

70 However, we will show that they still provide an improvement, in a certain regime, with respect to the (non-adaptive) LSU procedure in the PRDS case and with respect to the family of (non-adaptive) procedures proposed in Theorem 7 in the arbitrary dependence case. [sent-735, score-0.328]

71 Then, the p-value family p = (ph , h ∈ H ) is said to be positively regression dependent on each one from H0 (PRDS on H0 in short) if for any nondecreasing measurable set D ⊂ [0, 1]H and for all h ∈ H0 , the function u ∈ [0, 1] → P [p ∈ D | ph = u] is nondecreasing. [sent-737, score-0.315]

72 Our ﬁrst result concerns a two-stage procedure where the ﬁrst stage R0 is any multiple testing procedure with controlled FWER, and where we (over-) estimate m0 via the straightforward estimator (m − |R0 |) . [sent-742, score-0.358]

73 Theorem 22 Let R0 be a nonincreasing multiple testing procedure and assume that its FWER is / controlled at level α0 , that is, P [R0 ∩ H0 = 0] ≤ α0 . [sent-744, score-0.316]

74 Then the adaptive step-up procedure R with data-dependent threshold collection ∆(i) = α1 (m − |R0 |)−1 β(i) has FDR controlled at level α0 + α1 in either of the following dependence situations: • the p-value family (ph , h ∈ H ) is PRDS on H0 and the shape function is the identity function. [sent-745, score-0.469]

75 If we choose α0 = α1 = α/2 , then this procedure will outperform its 2859 B LANCHARD AND ROQUAIN non-adaptive counterpart (using the same shape function) only if there are more than 50% rejected hypotheses in the ﬁrst stage. [sent-748, score-0.315]

76 Then the adaptive step-up procedure R with data-dependent threshold collection ∆1 (i) = α1 β(i)Fκ (|R0 |/m)/m has FDR upper bounded by α1 + κα0 in either of the following dependence situations: • the p-value family (ph , h ∈ H ) is PRDS on H0 and the shape function is the identity function. [sent-755, score-0.414]

77 Therefore, here again this adaptive procedure is only useful in the cases where it is expected that a “large” proportion of null hypotheses can easily be rejected. [sent-760, score-0.448]

78 For the FWER-controlled ﬁrst stage of Theorem 22, we chose a standard Holm procedure (see Holm, 1979), which is a step-down procedure with threshold collection t(i) = αm/(m − i + 1) . [sent-774, score-0.291]

79 In Figure 5, we report the relative power to the oracle LSU, and the False Non-discovery Rate (FNR), which is the converse of the FDR for type II errors, that is, the average of the ratio of non-rejected false hypotheses over the total number of non-rejected hypotheses. [sent-775, score-0.279]

80 Nevertheless, this demonstrates theoretically the possibility of provably adaptive procedures under dependence. [sent-781, score-0.31]

81 Remark 24 Some theoretical results for two-stage procedures under possible dependence using a ﬁrst stage with controlled FWER or controlled FDR appeared earlier (Farcomeni, 2007). [sent-783, score-0.358]

82 Remark 25 The theoretical problem of adaptive procedures under arbitrary dependence was also considered by Sarkar (2008a) using two-stage procedures. [sent-788, score-0.327]

83 Conclusion and Discussion We proposed several adaptive multiple testing procedures that provably control the FDR under different hypotheses on the dependence of the p-values. [sent-791, score-0.6]

84 The procedure BR-2S is less conservative in general (except in marginal situations) than the adaptive procedure proposed by Benjamini et al. [sent-793, score-0.325]

85 Extensive simulations showed that these new procedures appear to be robustly controlling the FDR even in a positive dependence situation, which is a very desirable property in practice. [sent-795, score-0.287]

86 Secondly, we presented what we think is among the ﬁrst examples of adaptive multiple testing procedures with provable FDR control in the PRDS case and under unspeciﬁed dependence. [sent-824, score-0.4]

87 An important difference with respect to earlier works on this topic is that the procedures we introduced here are both theoretically founded and can be shown to improve over non-adaptive procedures in certain (admittedly limited) circumstances. [sent-825, score-0.366]

88 We ﬁrst want to underline once more that the theory for adaptive procedures under dependence is still underdeveloped. [sent-834, score-0.327]

89 It might actually be too restrictive to look for procedures having theoretically controlled FDR uniformly over arbitrary dependence situations such as what we studied in Section 4. [sent-835, score-0.311]

90 An interesting future theoretical direction could be to prove that some of the adaptive procedures showing good robustness in our simulations actually have controlled FDR under some types of dependence, at least when the p-values are in some sense not too far from being independent. [sent-836, score-0.378]

91 3 P ROOF OF T HEOREM 11 By deﬁnition of self-consistency, the procedure R satisﬁes R ⊂ {h ∈ H | ph ≤ α|R|G(p)/m}. [sent-866, score-0.328]

92 Now, for any h ∈ H0 \ {h0 } , denote p′ 0 the family p0,h0 , where the variable ph 0,h has been replaced by uh , an independent uniform variable on [0, 1] . [sent-875, score-0.292]

93 Since both ph and uh are independent of the other p-values, ph is stochastically lower bounded by uh and G is nonincreasing, we have E G(p0,h0 ) p−h ≤ E G(p′ 0 ) p−h , 0,h hence also in unconditional expectation. [sent-876, score-0.573]

94 1 + α − (k0 − 1)/m For the BKY06 procedure, we simply remark that since the linear step-up procedure of level λ rejects all the hypotheses when p1 ≤ λ and rejects no hypothesis otherwise, the estimator G1 and G3 are equal in this case. [sent-904, score-0.374]

95 Lemma 26 Assume R is a multiple testing procedure satisfying the self-consistency condition: R ⊂ h ∈ H | ph ≤ αG(p)β(|R|)/m , where G(p) is a data-dependent factor. [sent-909, score-0.392]

96 For the PRDS case, we note that since |R(p)| is coordinate-wise nonincreasing in each p-value, for any v > 0, D = {z ∈ [0, 1]H | |R(z)| < v} is a measurable nondecreasing set, so that the PRDS property implies that u → P [|R| < v | ph = u] is nondecreasing. [sent-916, score-0.367]

97 On the adaptive control of the false discovery rate in multiple testing with independent statistics. [sent-978, score-0.311]

98 Adaptive linear step-up procedures that control the false discovery rate. [sent-990, score-0.325]

99 Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. [sent-1093, score-0.347]

100 Asymptotic properties of false discovery rate controlling procedures under independence. [sent-1097, score-0.297]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fdr', 0.744), ('lsu', 0.253), ('ph', 0.248), ('benjamini', 0.192), ('roquain', 0.177), ('procedures', 0.175), ('hypotheses', 0.141), ('nonincreasing', 0.105), ('storey', 0.098), ('adaptive', 0.097), ('daptive', 0.097), ('ependence', 0.097), ('finner', 0.097), ('prds', 0.097), ('lanchard', 0.091), ('ontrol', 0.091), ('blanchard', 0.091), ('ndependence', 0.082), ('null', 0.081), ('procedure', 0.08), ('false', 0.076), ('conservative', 0.068), ('unspeci', 0.059), ('sarkar', 0.059), ('threshold', 0.057), ('estimator', 0.055), ('dependence', 0.055), ('shape', 0.051), ('proportion', 0.049), ('controlled', 0.049), ('control', 0.046), ('collection', 0.044), ('rejected', 0.043), ('testing', 0.042), ('fwer', 0.041), ('yekutieli', 0.041), ('simulations', 0.039), ('aorc', 0.038), ('genovese', 0.038), ('nulls', 0.038), ('stochastically', 0.037), ('oracle', 0.035), ('roof', 0.033), ('adaptivity', 0.032), ('rejects', 0.032), ('stage', 0.03), ('discovery', 0.028), ('power', 0.027), ('farcomeni', 0.027), ('fnr', 0.027), ('hochberg', 0.023), ('holm', 0.023), ('wasserman', 0.023), ('provably', 0.022), ('independence', 0.022), ('multiple', 0.022), ('alternate', 0.021), ('nonasymptotic', 0.02), ('positively', 0.02), ('reject', 0.019), ('controlling', 0.018), ('robustness', 0.018), ('provable', 0.018), ('powerful', 0.018), ('level', 0.018), ('family', 0.018), ('median', 0.017), ('round', 0.017), ('proved', 0.017), ('accordance', 0.016), ('heorem', 0.016), ('maximally', 0.016), ('remark', 0.016), ('etienne', 0.016), ('fleuret', 0.016), ('gavrilov', 0.016), ('situations', 0.016), ('theoretically', 0.016), ('dependent', 0.015), ('situation', 0.015), ('remember', 0.015), ('favorable', 0.015), ('lemma', 0.015), ('nondecreasing', 0.014), ('theorem', 0.014), ('asymptotical', 0.014), ('capping', 0.014), ('gilles', 0.014), ('rejections', 0.014), ('uh', 0.014), ('appears', 0.013), ('conversely', 0.012), ('quantity', 0.012), ('uniform', 0.012), ('estimators', 0.012), ('bounded', 0.012), ('generally', 0.012), ('concerning', 0.011), ('noticeably', 0.011), ('du', 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000013 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence

Author: Gilles Blanchard, Étienne Roquain

2 0.5783819 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

Author: Junning Li, Z. Jane Wang

Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classiﬁcation or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-speciﬁed level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-speciﬁed level, and a heuristic modiﬁcation of the method is able to control the FDR more accurately around the user-speciﬁed level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton

3 0.096398667 42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks

Author: Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau, René Garcia

Abstract: Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in its two arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of Lipschitz1 functions with these and other properties. We apply this new class of functions to the task of modelling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints. Keywords: neural networks, universal approximation, monotonicity, convexity, call options

4 0.03343071 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks

Author: Jean Hausser, Korbinian Strimmer

Abstract: We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. SpeciÄ?Ĺš cally, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efÄ?Ĺš cient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, James-Stein estimator, Ă˘€œsmall n, large pĂ˘€? setting, mutual information, gene association network

5 0.023965932 23 jmlr-2009-Discriminative Learning Under Covariate Shift

Author: Steffen Bickel, Michael Brückner, Tobias Scheffer

Abstract: We address classiﬁcation problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classiﬁcation under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classiﬁer for covariate shift. The optimization problem is convex under certain conditions; our ﬁndings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam ﬁltering, text classiﬁcation, and landmine detection. Keywords: covariate shift, discriminative learning, transfer learning

6 0.022724664 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification

7 0.018826433 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

8 0.017900661 41 jmlr-2009-Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation (Special Topic on Causality)

9 0.017141782 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling

10 0.017004337 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression

11 0.016992619 40 jmlr-2009-Identification of Recurrent Neural Networks by Bayesian Interrogation Techniques

12 0.015384953 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages (Special Topic on Mining and Learning with Graphs and Relations)

13 0.01504234 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

14 0.014891146 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning

15 0.014889049 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification

16 0.014516399 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

17 0.013858038 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

18 0.013789646 29 jmlr-2009-Estimating Labels from Label Proportions

19 0.013679542 30 jmlr-2009-Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods

20 0.01356656 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.109), (1, -0.093), (2, 0.155), (3, 0.678), (4, 0.159), (5, 0.31), (6, 0.38), (7, -0.022), (8, -0.079), (9, 0.016), (10, 0.048), (11, -0.021), (12, -0.036), (13, -0.0), (14, -0.029), (15, 0.021), (16, 0.01), (17, -0.015), (18, -0.004), (19, 0.008), (20, -0.002), (21, -0.03), (22, -0.039), (23, -0.008), (24, 0.043), (25, 0.032), (26, -0.001), (27, -0.015), (28, -0.03), (29, 0.011), (30, 0.004), (31, -0.012), (32, 0.012), (33, -0.037), (34, 0.01), (35, 0.011), (36, 0.008), (37, 0.021), (38, 0.0), (39, -0.018), (40, 0.018), (41, -0.036), (42, 0.044), (43, -0.003), (44, 0.001), (45, 0.012), (46, 0.042), (47, 0.024), (48, 0.014), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97566587 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence

Author: Gilles Blanchard, Étienne Roquain

2 0.90293974 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

Author: Junning Li, Z. Jane Wang

3 0.20083658 42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks

Author: Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau, René Garcia

4 0.093595237 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks

Author: Jean Hausser, Korbinian Strimmer

5 0.089422323 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling

Author: Dirk Gorissen, Tom Dhaene, Filip De Turck

Abstract: Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visualization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist (Support Vector Machines, Kriging, Neural Networks, etc.) but no type is optimal in all circumstances. Nor is there any hard theory available that can help make this choice. In this paper we present an automatic approach to the model type selection problem. We describe an adaptive global surrogate modeling environment with adaptive sampling, driven by speciated evolution. Different model types are evolved cooperatively using a Genetic Algorithm (heterogeneous evolution) and compete to approximate the iteratively selected data. In this way the optimal model type and complexity for a given data set or simulation code can be dynamically determined. Its utility and performance is demonstrated on a number of problems where it outperforms traditional sequential execution of each model type. Keywords: model type selection, genetic algorithms, global surrogate modeling, function approximation, active learning, adaptive sampling

6 0.072255209 30 jmlr-2009-Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods

7 0.071983777 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification

8 0.070918277 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification

9 0.069137193 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

10 0.062382422 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

11 0.062167302 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

12 0.061887473 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

13 0.061458882 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages (Special Topic on Mining and Learning with Graphs and Relations)

14 0.060701307 29 jmlr-2009-Estimating Labels from Label Proportions

15 0.059428912 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression

16 0.059142992 16 jmlr-2009-Classification with Gaussians and Convex Loss

17 0.058785371 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

18 0.058380295 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)

19 0.05724521 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification

20 0.056189973 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.037), (11, 0.027), (19, 0.018), (21, 0.012), (26, 0.014), (38, 0.029), (47, 0.022), (52, 0.046), (55, 0.058), (58, 0.023), (66, 0.092), (68, 0.018), (90, 0.047), (96, 0.018), (99, 0.445)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.66092539 5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence

Author: Gilles Blanchard, Étienne Roquain

2 0.6436916 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

Author: Saharon Rosset

Abstract: We show how to follow the path of cross validated solutions to families of regularized optimization problems, deﬁned by a combination of a parameterized loss function and a regularization term. A primary example is kernel quantile regression, where the parameter of the loss function is the quantile being estimated. Even though the bi-level optimization problem we encounter for every quantile is non-convex, the manner in which the optimal cross-validated solution evolves with the parameter of the loss function allows tracking of this solution. We prove this property, construct the resulting algorithm, and demonstrate it on real and artiﬁcial data. This algorithm allows us to efﬁciently solve the whole family of bi-level problems. We show how it can be extended to cover other modeling problems, like support vector regression, and alternative in-sample model selection approaches.1

3 0.2889432 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classiﬁcation that explicitly build in protection to noise, and at the same time control overﬁtting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine

4 0.28381303 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

Author: Junning Li, Z. Jane Wang

5 0.2818042 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

Author: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni

Abstract: We start by showing that in an active learning setting, the Perceptron algorithm needs Ω( ε12 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modiﬁcation of the Perceptron update with an adaptive ﬁltering rule for deciding which points to query. For data distributed uniformly over the unit ˜ sphere, we show that our algorithm reaches generalization error ε after asking for just O(d log 1 ) ε labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm. Keywords: active learning, perceptron, label complexity bounds, online learning

6 0.27938083 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

7 0.27588299 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

8 0.27549899 27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting

9 0.27509782 38 jmlr-2009-Hash Kernels for Structured Data

10 0.27487099 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

11 0.27266845 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression

12 0.27240607 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

13 0.27218702 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression

14 0.27163032 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

15 0.27100447 48 jmlr-2009-Learning Nondeterministic Classifiers

16 0.27096125 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

17 0.27095667 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

18 0.27072775 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

19 0.27067524 18 jmlr-2009-Consistency and Localizability

20 0.27000189 13 jmlr-2009-Bounded Kernel-Based Online Learning