Odd: Bayesian Repeated Measures seems biased against interactions and in favor of main effects
Edit: I should clarify that the problem (or what I perceive to be as such) is only with the inclusion Bayes Factor that you get when tick the 'Effects' box. The standard model-comparison table is fine.
I've been playing a bit with Bayesian Repeated Measures, mostly based on this question by @v.b. I used to think that I understood it fairly well, at least conceptually. But I can't wrap my head around the results that I describe below.
Follow me. We'll go by illustration.
I simulated data from a Posner cuing experiment. (So this is fake data.) This is a 2 × 2 design:
Cue (left/ right), and
Target (left/ right). I simulated a small cuing effect, and as you can see below, it is not strongly supported: the inclusion BF for
Cue × Target is 1.131. (I focus on the inclusion BF here, so the lower table.)
Ok. So far, nothing wrong.
But it's a cuing effect, right? So another way to code the results is with a single factor:
Validity (cue valid/ cue invalid). That shouldn't make any difference, right? And it doesn't make any difference in a traditional Repeated Measures, as you can see in the attached JASP file. But it makes a huge difference for the Bayesian analysis! If I collapse the valid and invalid columns from the data, and analyze it as a single-factor design, the inclusion BF for
Validity is suddenly 7.169!
So a trivial recoding of the data brings us from anecdotal evidence to substantial evidence!
This seems to be a general property of Bayesian Repeated Measures: The more interactions an effect has, the lower the inclusion BF. You can see this with the following analysis that was conducted on purely random data:
I don't see how this behavior can be desirable. Surely all effects, regardless of whether they are interactions or main effects, should be given a fighting chance? But it seems that you are almost guaranteed to find evidence against interaction effects, especially higher-order interactions.
What is worse—but this I'm not sure about—I suspect that evidence in favor of main effects is exaggerated. I say this, because in the data of @v.b, the inclusion BFs for main effects are extremely high, while these same effects are not reliable in a traditional Repeated Measures:
On the one hand, the results of the Bayesian Repeated Measures seem invalid to me; on the other hand, I cannot believe that such an obvious problem would've gone undetected by anyone but me until now. So there are two options, both with a very low prior probability in my mind. Which makes me think that I fundamentally misunderstand what the output of the Bayesian Repeated Measures means.