Different results Bayesian vs Classical RM ANOVA

johan_achard · July 2023

Hello,

The results between a frequentist and a Bayesian RM ANOVA are different. This is surprising and changes the conclusions. Is this a JASP problem? Which analysis is reliable, Frequentist or Bayesian?

Here is the github link to the JASP file: https://github.com/JohanACHARD/jasp_RM_anova_difference_freq_bayesian/tree/54c8689889dd8d147ef10ed5a288540a6b3f00c3

Thanks in advance for your help.

Best regards,

Johan

andersony3k · July 2023

I don't know the answer to your question, but apparent inconsistencies like the one you're seeing lead me to adopt a general practice trusting simpler Bayesian analyses (e.g., t tests) more than complex ones (e.g., ANOVAs).

andersony3k · July 2023

Also, if you want to compare Bayesian RM ANOVA to something frequentist, you should compare it to a frequentist linear mixed-effects model since, I'm told, JASP's Bayesian "repeated measures ANOVA" is really a Bayesian Linear Mixed-Effects model.

EJ · August 2023

Correct, andersony3k. In addition, note that we have recently updated our RM Bayesian methodology to be more in line with the frequentist one. See this blog post and associated article:

Bayesian Repeated-Measures ANOVA: An Updated Methodology Implemented in JASP - JASP - Free and User-Friendly Statistical Software

https://jasp-stats.org/2022/07/29/bayesian-repeated-measures-anova-an-updated-methodology-implemented-in-jasp/

This post is a teaser for van den Bergh, D., Wagenmakers, E., & Aust, F. (2022). Bayesian Repeated-Measures ANOVA: An Updated Methodology Implemented in JASP. Preprint available on PsyArXiv: https://psyarxiv.com/fb8zn/ In JASP 0.16.3 we changed the default Bayesian repeated-measures ANOVA.… Continue reading →

So make sure you use a recent version of JASP.

EJ

johan_achard · August 2023

Dear all,

Thank you for your clear and precise answers.

I've compared Baysian RM ANOVA to a frequentist mixed model as you suggest (in JASP), and the results still don't seem similar between the two methods (frequentist and bayesian).

Thank you EJ for sharing this very interesting blog link, which I read carefully.

That said, I used the latest version of JASP for this analysis, including the Bayesian RM update, and the difference between frequentist and Bayesian remains.

If I understand correctly, JASP approximates the RM-ANOVA Bayesian analysis of a Bayesian mixed model? In this case, how is the Bayesian mixed analysis constructed in the JASP "Linear Mixed Model" module, and how do you interpret the JASP output, which is different from that of the Bayesian ANOVA (no BF)?

To go further, I used the anovaBF() function on R from Richard D. Morey's BayesFactor package (more details here). Attached is a screen of the output and the code. I find similar results to frequentist ANOVA with this approach. What's the difference with JASP, which includes randoms effects? Can you explain the subtlety?

Thanks in advance for all your answers which will help me to clarify these questions,

Johan

comparaison R - RM ANOVA freq and bayesian.png

frederikaust · August 2023

Let me start with the comparison between the frequentist ANOVA in R and JASP. While JASP, by default, uses Type III sums-of-squars, the function stats::aov() uses Type I sums-of-squares. If you use the same sums of squares, you will see the same results:

afex::aov_4(
  rt ~ coherence * delay + (coherence * delay | id)
  , data = dat |>
    dplyr::filter(delay != "0s")
   
)

Anova Table (Type 3 tests)

Response: rt
           Effect          df  MSE       F  ges p.value
1       coherence 1.28, 29.49 0.00 7.55 ** .004    .006
2           delay 2.00, 45.92 0.00 6.56 ** .009    .003
3 coherence:delay 1.37, 31.53 0.00    0.92 .003    .376
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

Sphericity correction method: GG

As detailed in the blog post (and the paper it announces), different sums-of-square types represent different model comparisons, so different results are expected. Moreover, BayesFactor::anovaBF() uses different error strata for hypothesis testing than the standard frequentist ANOVA. To obtain the Bayesian analog of the frequentist analysis do this:

BayesFactor::generalTestBF(
  rt ~ coherence * delay + id + id:coherence + id:delay
  , data = dat |>
    dplyr::filter(delay != "0s") |>
    droplevels()
  , whichRandom = "id"
  , neverExclude = "id"
  , whichModels = "top"
  , rscaleRandom = 1
  , rscaleFixed = 0.5
) |>
  as.vector() |>
  (\(x) 1/x)() |>
  rev()

The following gives the evidence for omitted effect (e.g. the first number is the Bayes factor for the main effect of coherence).

delay + coherence:delay + id + coherence:id + delay:id
                                                 0.5487767
coherence + coherence:delay + id + coherence:id + delay:id
                                                 4.2996458
          coherence + delay + id + coherence:id + delay:id
                                                 0.1997267

These results are consistent with the output in JASP, if you compare (i.e. devide the Bayes factors) the full model to models that omit the effect of interest. To see all these models in the JASP output, go to "Additional options" and deselect the option to enforce the principle of marginality for fixed effects and random effects.

Bildschirmfoto 2023-08-21 um 15.56.47.png

For example, the Bayes factor for the interaction term (without model averaging) can be calculated as 0.402/1.985 = 0.203 (deviations are due to numerical imprecisions). The model averaged results are virtually the same (the last line ensures that the random effects are always included:

model_space <- BayesFactor::generalTestBF(
  rt ~ coherence * delay + id + id:coherence + id:delay
  , data = dat |>
    dplyr::filter(delay != "0s") |>
    droplevels()
  , whichRandom = "id"
  , neverExclude = "id"
  , whichModels = "all"
  , rscaleFixed = 0.5
  , rscaleRandom = 1
)


(model_space[-8] / model_space[8]) |>
  bayestestR::bayesfactor_inclusion()

Inclusion Bayes Factors (Model Averaged)

                P(prior) P(posterior) Inclusion BF
id                  1.00         1.00             
coherence:id        1.00         1.00             
delay:id            1.00         1.00             
coherence           0.50         0.35        0.528
delay               0.50         0.80         3.89
coherence:delay     0.50         0.16        0.184

* Compared among: all models
*    Priors odds: uniform-equal

So, as you can see, JASP gives the same results as the BayesFactor package in R, if the same settings are used (this is because JASP uses BayesFactor under the hood).

Now that we have established that the results in JASP are consistent with those from R and BayesFactor , let's move on to the divergences between the frequentist and Bayesian analyses. If we plot the data of individual subjets, we see that participant 16 exhibits substantially slowed response times in several conditions. I have no context here, so I have no idea if these are sensible observations.

If participant 16 is excluded from the analysis, we see that frequentist and Bayesian analyses give consistent results, that match those from the frequentist analysis of the full dataset.

afex::aov_4(
  rt ~ coherence * delay + (coherence * delay | id)
  , data = dat |>
    dplyr::filter(delay != "0s") |>
    dplyr::filter(id != "16") |>
    droplevels()
)

Anova Table (Type 3 tests)

Response: rt
           Effect          df  MSE       F   ges p.value
1       coherence 1.30, 28.63 0.00 9.07 **  .006    .003
2           delay 2.00, 43.94 0.00 6.37 **  .012    .004
3 coherence:delay 2.79, 61.36 0.00    1.15 <.001    .333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

Sphericity correction method: GG

Now the Bayesian analysis also provides substantial evidence for the two main effects and some evidence against the interactions.

BayesFactor::generalTestBF(
  rt ~ coherence * delay + id + id:coherence + id:delay
  , data = dat |>
    dplyr::filter(delay != "0s") |>
    dplyr::filter(id != "16") |>
    droplevels()
  , whichRandom = "id"
  , neverExclude = "id"
  , whichModels = "top"
  , rscaleRandom = 1
  , rscaleFixed = 0.5
) |>
  as.vector() |>
  (\(x) 1/x)() |>
  rev()

delay + coherence:delay + id + coherence:id + delay:id
                                                41.9658769
coherence + coherence:delay + id + coherence:id + delay:id
                                                12.8885796
          coherence + delay + id + coherence:id + delay:id
                                                 0.1636591

So, it seems the Bayesian ANOV may be less robust to outliers, but we should look into this some more. I hope this helps.

johan_achard · August 2023

Thank you for taking the time to provide such a complete and accurate answer! It's been a great help to me in understanding the inner workings of JASP and the statistical analyses I carry out. The parallel between JASP and R is very clear now thanks to you. Thank you very much.

The difference between Frequentist and Bayesian ANOVA would seem to come from the outliers. This seems consistent with this article by HFF Mahmoudque, who explains that there are still some gaps between the two approaches (http://dx.doi.org/10.18576/jsap/090301), and in which the author attempts to evaluate the Frequentist and Bayesian approaches using intensive simulations with either data containing outliers or no outliers.

A last comment:

Taking all these elements into account, I applied Tukey's Inter-Quartile Range method (IQR = Q3 - Q1) to the data as a pre-processing step to obtain the average reaction time of the dataset. The classic 1.5 scale was used to determine the lower and upper bounds (respectively Q1 - 1.5 * IQR , and Q3 + 1.5 * IQR). (For a brief review of outlier detection, see Saleem, S., Aslam, M., & Shaukat, M. R. (2021). A REVIEW AND EMPIRICAL COMPARISON OF UNIVARIATE OUTLIER DETECTION METHODS. Pakistan Journal of Statistics, 37(4) ).

After performing Bayesian RM-ANOVA in JASP on the two datasets (with or without Tukey IQR method), the results of the frequentist approach are similar :

And correspond as results of the Bayesian approach when Tukey IQR has been applied :

It therefore seems that the Bayesian approach is more sensitive to outliers, as you suggest. In this case, the Tukey method (or subject deletion) was sufficient to explain the difference between the two approaches.

Thank you for your clarification, and I'll keep you posted on future research into the question of robustness to outliers with the Bayesian approach.

Thank you very much!

Johan

Howdy, Stranger!

Categories

Different results Bayesian vs Classical RM ANOVA

Comments

Bayesian Repeated-Measures ANOVA: An Updated Methodology Implemented in JASP - JASP - Free and User-Friendly Statistical Software

Howdy, Stranger!

Quick Links

Categories

Different results Bayesian vs Classical RM ANOVA

Comments

Bayesian Repeated-Measures ANOVA: An Updated Methodology Implemented in JASP - JASP - Free and User-Friendly Statistical Software