Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi

1 Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . [sent-1, score-0.09]

2 01 on some of the subscales and nothing significant on the rest. [sent-4, score-0.089]

3 It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). [sent-5, score-0.719]

4 While googling on the subject of large N, I came across this entry in your blog. [sent-6, score-0.392]

5 My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? [sent-7, score-0.364]

6 And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fitness” in table 4) when it is simply (cf. [sent-10, score-0.387]

7 10) an amalgam of the four other DVs listed immediately underneath it, of which one (“Friendliness”) has a significance of <. [sent-12, score-0.111]

8 CSF is a $140 million programme that has been controversial for all sorts of reasons. [sent-17, score-0.434]

9 There’s a whole bunch of other stuff that about this process, such as their use of MANOVA at T1 and “ANOVA with blocking” at T2, that makes me think they are on a fishing expedition for cherries to pick. [sent-18, score-0.394]

10 For example, the means in some of the tables are “estimated marginal means” (MANOVA output), the SD values are in fact SEMs, and I have no idea why they are expressing effect sizes as partial eta squared when they only have one independent variable. [sent-19, score-0.482]

11 But I’m a complete newbie to stats, so I’m probably missing a lot of stuff. [sent-20, score-0.111]

12 That report is almost a parody of military bureaucracy! [sent-22, score-0.183]

13 The people doing this research have real problems for which there are no easy solutions. [sent-24, score-0.1]

14 In short: none of the effects is zero and there’s gotta be a lot of variation across people and across subgroups of people. [sent-25, score-0.633]

15 It’s a classic multiple comparisons situation, but the null hypothesis of zero effects (which is standard in multiple-comparisons analyses) is clearly inappropriate. [sent-27, score-0.339]

16 Multilevel modeling seems like a good idea but it requires real modeling and real thought, not simply plugging the data into an 8-schols program. [sent-28, score-0.556]

17 We have seen the same issues arising in education research, another area with multiple outcomes, treatments varying across predictors, and small aggregate effects. [sent-29, score-0.468]

Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi

