"Post-hoc less conservative than planned t test" problem in a mixed-model ANOVA
Hi,
There is something that i can hardly understand.
I performed a Mixed-Model ANOVA (see JASP file attached) with Judgment (2 levels: E, T), Congruence (2 levels, Congruent, Incongruent), and Compatibility (2 levels, Compatible, Incompatible) as within-subject factors ; and Group (2 levels) as a between subject factor.
When i perform a post-hoc tests on the Judgment x Congruence (marginal) interaction (i know, i know, one should not perform post-hoc tests on nonsign interactions ....) the p value for the "T,Congruent" versus "T,Incongruent" comparison is 0.00329 (for both Holm and Bonferroni corrections).
However, i perform a paired t test to compare the same means (see the "T_Congruent" vs "T_Incongruent" columns) is get a p value of 0.00544
First, i wondered "How can a post-hoc test be less conservative than a planned t test ?". Then i had a look at the means and noticed that the mean difference computed by the post-hoc test (about |239| ) was different from the one computed using the paired t test (about |253|, the actual mean effect). So this suggests that the means for the post-hocs are somehow computed differently.
Could you please explain how can this be ? Thanx.
Best regards,
Michel-Ange
Comments
BTW:
if i get the between-subject factor out of the ANOVA, then the Mean difference of the T,Congruent vs T,Incongruent effect in the post-hocs is the same as with the planned comparison (i.e., |253|). So, it means that it has to do with the groups.
However, the post-hoc p value get even less conservative (p = 0.0019) that the corresponding planned comparison (p = 0.00544). Even weirder...
FYI:
The notes to the post-hoc test output include: "Results are averaged over the levels of: Group, Compatibility." However, your "planned comparison simply ignores those other factors. If the 'n's are different, the difference that results from averaging across the levels of group and of compatibility may be different than the simple difference, in the planned comparison, that ignores group and compatibility. The standard errors will be different too. The bottom line is that the post-hoc procedure isn't testing the same difference as the "planned" comparison is.
JASP doesn't offer the option of post-hocs that report uncorrected (i.e., Fisher's LSD) p values, but it seems that's what you might want.
R
Thanx a lot for your reply.
Indeed i understand now how the mean difference was computed for the post-hoc test (about |239| ). It corresponds the grand average of the mean difference within each group. That is a mean of two mean differences. Thereby, this grand average does not take into account n of each group.
In order to get the same mean difference as the mean difference computed on all the participants without taking into account the existence of groups, one would need to do a weighted mean when computing this grand average.
Could you please give me the formula that JASP is using to compute these post-hoc t tests (whether pbonf or pholm) ? I'm still puzzled by the fact that a post-hoc test can lead to a less conservative result as compared to a planned comparison. I feel like the post-hoc test isn't really doing its job (i.e., being more conservative than planned comparisons), don't you ?
Hi @maamorim
In addition to @andersony3k 's helpful comment, I would like to point you to this blogpost from a while ago, where I outline the follow-up tests for ANOVA. Basically, contrasts and post hoc analyses are based on the estimated marginal means (using the emmeans package), and so can have a different value, and importantly, standard error. In your case, you can see the SE differs between the t-test and post hoc test, and so will lead to a (somewhat) different p/t-value.
If you want the uncorrected p-values, you can take the bonferroni ones, and divide them by the number of comparisons (in your case, 6), although these will be less conservative.
Does that solve your issue?
Kind regards,
Johnny
Hi @JohnnyB
Thank you for your reply and this blogpost which help me better understand the different options for follow-up tests for ANOVA.
My point is that, normally, post-hoc tests should be more conservative (less Type I error) than planned comparisons ; and this is what i need (corrected tests for multiple comparisons). What I expect is greater p-values (less significant) for post-hoc tests, as compared to uncorrected (planned) comparisons. I'm just confused that the opposite can be found.
Of course, I understand now that under special circumstances (as explained in this blogpost) this may occur, for example when including (rather than not) a Group factor in the computation (and post-hoc test) of a RM effect.
I guess that i only need to accept that.
Cheers,
Michel-Ange
Hi @maamorim ,
Great to hear! Just to clarify - if you were to compare the planned contrast analysis (which is also based on the marginal means) to the posthoc analysis, these are identical, except for the p-value correction (which always makes the posthoc more conservative).
If you are comparing posthoc to t-test, there is an extra step between them (namely that only one of these is based on marginal means) that means that the posthoc is not always more conservative.
Cheers,
Johnny
@maamorim
If one inspects the documentation for the emmeans package, one finds that it's quite complex. See https://cran.r-project.org/web/packages/emmeans/vignettes/FAQs.html
In particular, it says:
"[FAQ:] If I analyze subsets of the data separately, I get different results
Estimated marginal means summarize the model that you fitted to the data – not the data themselves. Many of the most common models rely on several simplifying assumptions – that certain effects are linear, that the error variance is constant, etc. – and those assumptions are passed forward into the
emmeans()
results. Doing separate analyses on subsets usually comprises departing from that overall model, so of course the results are different."So perhaps all sorts of things might be happening. Since the entire model is bein used to estimate each mean, maybe there are more degrees of freedom for each emmeans post hoc-test than for an individual t test. I also notice that the standard error of the difference between means is substantially smaller with emmeans than with in individual t test--maybe because the entire emmeans model is used to estimate each standard error. In my opinion, this complexity is a drawback of using emmeans.
R
@maamorim
FYI. From the FAQ pertaining the the emmeans R package:
"FAQs for emmeans . . .
If I analyze subsets of the data separately, I get different results
Estimated marginal means summarize the model that you fitted to the data – not the data themselves. Many of the most common models rely on several simplifying assumptions – that certain effects are linear, that the error variance is constant, etc. – and those assumptions are passed forward into the
emmeans()
results. Doing separate analyses on subsets usually comprises departing from that overall model, so of course the results are different."So the entire linear model is used to estimate each mean and each standard error for each post hoc test. Consequently, a t test (or other test) on a subset of the data will yield a different result unless the full data set *perfectly* meets all of the underlying mathematical assumptions.
I see, for example that the standard error is substantially higher for your t test than for the corresponding emmeans post-hoc test.
(Also, as an aside, one can't always work backwards from a Bonferroni-corrected p value to get an uncorrected one because the Bonferroni-corrected value has a ceiling of 1.0.)
R