# Interpretation help (RM ANOVA)

**15**

Hi,

unfortunately, I have a problem that I can not handle. I found a significant interaction effect (p < .02) based on classical repeated measures ANOVA analysis but I got an anecdotal BFinclusion interaction score of ~1.6 based on Bayes analysis. Now my problem is how I should handle this in a paper. When I would publish the results without the BFs everything would be fine, but now I have to find the right words when I include the BFs. Because now a reviewer could possibly say that my study was not designed well enough to find the effect (or results are "worthless" because I can not say whether H0 or H1 is supported) and rejects the paper?

Another problem in this context is that I can not explain what the prior in the repeated measures ANOVA really means (r scale fixed effect of: 0.5; r scale random effects: 1; and r scale covariates: 0.354).

Thanks for any advice or help!

Best,

Markus

## Comments

266Hi Markus,

Well, I would just be transparent. Sometimes you do get these conflicts and, in my opinion, they urge caution. If you had a specific contrast in mind then you ought to test that (I believe Richard has a blog post showing how this can be done; we are working to implement something like that into JASP but haven't done so yet). With respect to the prior scales, the settings are explained in the relevant papers. I think you may not have covariates and random effects, so then the only thing to explain is the r scale for the fixed effects. This is based on the width of a multivariate Cauchy. It has been chosen so that the results are consistent with a t-test in case of two conditions in a between-subjects design. I would not attempt to explain this but just mention they are the default settings.

Cheers,

E.J.

15Hi E.J.

Thank you very much for your response. I posted my questions a second time because I thought that nobody would recognize it as a separate post.

Regarding my problem above, is it allowed to base my interpretation on the BFInclusion score? In other words, reporting the BF which compares the interaction model against the two main effects model and the BF that compares the interaction model against all cadidates models, but focussing on the BFinclusion score?

Thank you very much!

Best,

Markus

266Hi Markus,

When you have few models, I am in favor of including the entire tables, perhaps as a supplement.

Cheers,

E.J.

15Hi E.J.,

regarding my post above. When my interaction model has an anecdotal BF10 but my BFInclusion is moderate, which one of these two should I give more weight in my interpretation? Because for me it makes a difference to say the interaction effect gets weak or (at least) moderate support.

Thank you very much!

Best,

Markus

266The reason for in the increased support in the inclusion method may be due to the fact that some models (like the null model, or the model with only one factor) perform very poorly. I am not so sure that this effect is of interest to you.

Cheers

E.J.

725Hi EJ,

I have a similar issue as Markus. Also my Bayesian ANOVA is not as convinced of the existence of an effect than the classical ANOVA. However, in my case, the difference is rather large. The classical ANOVA (df=19) yields an F=9.00 and p = .007, whereas the BF for this effect is 0.6, so providing even anecdotal evidence for the Null (the jasp output and the figure of the means incl. within-subject 95% CI are attached!). From looking at single-subject data, I can say that the effect is indeed small (~10ms), but rather consistent over subjects. Only one subject is showing the opposite effect but three times as strong as everyone else. However, this alone is no reason for exclusion because the overall performance of that subject is still within 2SD of the sample mean.

I was wondering whether it is possible to have these to analyses to diverge so strongly, or whether it is more likely that an error must have happened somewhere along to road. And if it really is possible, do you know what the reasons for that could be, also given my data in particular? I understand that Bayesian stats tend to be in general a little more conservative than the classical ones, but why exactly is that?

This experiment is the third in a series of very similar ones, and the effects so far were always rather strong and consistent between Bayesian and classical approach. So, I was also wondering whether it is possible in JASP or R to provide the outcome of earlier ones as priors in later analyses? In another discussion, I read that simply multiplying the BF doesn't work. Is there a way?

Finally, what is your recommendation for how to tackle the issue? Just being transparent, along the lines of "classical ANOVA finds an effect, however this is not supported by Bayes", or would you take more measures? I was also running a t-test between the two conditions where I expected the effect to originate from and found moderate support for my hypothesis.

Your opinion is very much appreciated!

Thanks,

eduard

266Hi Eduard,

I assume the interest is in the interaction? In general BFs are less enthusiastic because they look at both sides of the coin --H0 and H1-- instead of just focusing on H0. Indeed, multiplying BFs is not allowed, as it uses the prior again. So the correct approach, as you suggest, is to compute BFs using the updated distributions. This is not yet possible in JASP.

Being transparent is always good. However, perhaps you can achieve more informative results by not just testing an interaction "in general", but opt for a more informative contrast. I believe Richard has a blogpost on that. In addition, sometimes we see big differences between the two paradigms when particular assumptions are not met (outliers, heterogeneity of variances, etc). So you could check that too. Maybe Richard likes to weight in as well.

Cheers,

E.J.

725Hi EJ,

Thanks for your reply.

Indeed the interaction is what matters most.

Do you happen to mean this blog post? In a 2x2 design, wouldn't this boil down to a simple t-test?

Is it possible directly in R with the BayesFactor package?

Just for sakes of clarity, if some assumptions would not be met, this would mostly concern the outcome of the classical ANOVA?

Thanks again,

Eduard

266Hi Eduard,

I'm not sure what tests would be most effected by a violation of assumptions. It feels a little like comparing apples and oranges, but perhaps it can be done. Yes I meant that blog post -- or the next one, http://bayesfactor.blogspot.nl/2015/01/multiple-comparisons-with-bayesfactor-2.html. What I'm saying is that your interaction can be specified more exactly as a specific ordering of means (equality and inequality constraints).

E.J.

725This looks interesting. I'll give it a try.

And a last thing. Provided that neither this more specific analysis turns out to support our hypothesis, how much of a problem would it be to just try to publish the data nevertheless? (Of course, this is a highly subjective question. I just wondered how it might appear to reviewers.)

In any case, thanks for your support. Very helpful.

Eduard

266I don't think it's a problem at all. Did you see this paper by Etz and Lakens about not every study needing to provide picture-perfect results? Besides, I think you should only be applauded for being transparent. And my guess is that this will happen.

Cheers,

E.J.

725Hi EJ,

I ran the analysis that you suggested (specifying the interaction as a specific ordering of means), which seems to work. So that is good. However, on the way I bumped into a couple of things, that I'm not quite sure whether I understand.

Mostly, I'm not sure which parameter to choose for the "whichModel" parameter in the BayesFactor analysis. I tried "top", "bottom" and the default value, and I think I understand what they mean conceptually. The problem is that I don't know how to extract the BF for each effect (M1, M2, IE) if I follow the standard procedure as Richard is describing in his blog entry (which is the default,

`withmain`

), which yields only BF for each model compared to the NULLmodel. If I use`bottom`

however, I do get adjusted BF for each effect, but the original BF are not comparable to the JASP output any more. I suppose, the reason for that is that in each case I compare the factors to a different Nullmodel.Therefore, my questions: How do I extract the BF for each effect from the BF given by an analysis in that form:

`bf<- anovaBF(DV~color_IV1*IV2+subj,data= data_df, whichRandom="subj",whichModels='withMain')`

And, What do I have to keep in mind respective the interpretation of the BF when choosing a different parameter for

`whichModels`

along the lines of:`bf<- anovaBF(DV~color_IV1*IV2+subj,data= data_df, whichRandom="subj",whichModels='bottom')`

I hope I could formulate my problems clear enough. If not I gladly rephrase or give more detail.

Thanks,

Eduard

266Hi Eduard,

This is really a question for Richard, who is in charge of the "BayesFactor" component of this Forum. I'll specifically attend him to your question.

Cheers,

E.J.

725Thanks!

7Hi Eduard,

Simply use the function "as.vector" to extract the Bayes factors from a Bayes factor object.

The only differences between the whichModels specifications are which models are tested, and to which models they are compared:

"all" gives all combinations of effects, including those with interactions but not the constituent main effects. So, for a two-way anova, you'll get all of these compared to the null model:

a

b

a + b

a + b + a:b

a + a:b

b + a:b

a:b

"withmain" gives all models, excluding when a main effect is not with its interaction all compared to the null model.

a

b

a + b

a + b + a:b

"top" gives

a + a:b

b + a:b

a + b

all compared to the "full" model a + b + a:b; that is, each effect is "taken away" from the full model and tested.

"bottom" gives

a

b

a:b

all compared to the null model with no effects.

I would not recommend using "top" or "bottom" in everyday research. They are mostly added for convenience of people generating subsets of models. The problem with them can best be seen in the regression context. Imagine two covariates that are highly correlated, but are also correlate with the DV. Testing with "top" will lead you to the conclusion that neither covariate is needed, because they share a great deal of variance. Testing with "bottom" would lead you to the conclusion that both are needed. What you should to is compare a, b, and a+b to one another (in the ANOVA context you'd have a:b in there too). Then you'd see that a and b alone are good, but a+b and the null are bad. That is, you need one of the two covariates but not both. You can only see this by looking at the constellation of model comparisons.

The idea that you can get a separate Bayes factor for each effect -- as opposed to comparing models -- is flawed. You'll fall into the same traps that people fall into with p values (e.g., problems with multicollinearity). My recommendation is to stick with the model comparison ("withmain" or "all").

725Hi Richard,

Thanks for your input. It really cleared things up.

But once you're already in the discussion, would you mind helping me out on the initial discussion here?

In brief, I have three experiments, each with a 2x2 repeated measures design (basically replications of each other). In the first and second experiment, classical and Bayesian ANOVA agree that the best model is the full model (M1,M2, IE) In the third model however, there is no evidence for the interaction according to BF, even though the classical ANOVA finds a rather significant effect (p = .007).

As I have rather specific predictions with respect to the interactions, EJ suggested that I check whether I can find evidence for the interaction with a more specific predictions by using order restrictions (according to this blog post of yours.

The example you give is for a univariate ANOVA with 3 levels. So firstly, I was wondering whether this method is also applicable for a 2x2 design? Secondly, if it is possible to use it, what would be the number of possible orderings? My first idea was 24 because I have 4 cells and each could be different from each other. However, I don't want to artificially blow up the number of possible orderings as this would have a huge effect on the end result.

For sake of completeness, here is the code that I used:

Thanks for your help

Eduard

725Hi @richarddmorey,

Not sure whether you are just busy, or whether you haven't seen this post yet. In latter case, I hope this is post is reaching you. Otherwise, sorry for spamming. It's just that I'm a little excited about that analysis. Once I know whether my procedure makes sense, we can submit the manuscript.

Thanks,

Edaurd