Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

Large Bayes Factor changes with exclusion of single subject (Bayesian ANOVA)

Hi all,

I have run both frequentist and Bayesian 2x2x3 repeated measures ANOVA's for my analysis of some reaction time data. At one point I noticed an interesting discrepancy between the two: in the frequentist analysis, a main effect (of SESSION) was not significant (p = 0.09), but it received fairly strong support in the Bayesian analysis (BF10 of only the model: 56, inclusion BF: 40).

This struck me as odd: I have had only encountered the opposite situation before (significant p-value, BF in favor of null). So I decided to look at the individual subject data (for a plot, see data.png file). It turns out there is one subject out of 26 that showed a much stronger effect than everyone else (one of the orange lines in the figure). I have no reason to exclude this subject, but because I'm still learning about Bayesian statistics I decided to try and see what would happen if I reran the analyses without this subject.

To my surprise, while the p-value changed only moderately (from 0.09 to 0.14), the Bayes factor plummeted from 57 to around 2. So by excluding just one subject I have gone from "strong" to "anecdotal" evidence for H1.

I don't really have a good intuition for Bayes factors yet, but this large change seems quite odd, especially compared to the frequentist analysis. Also, I am really in doubt how to interpret and report this effect now, if at all.

I'm left wondering under what circumstances such discrepancies between frequentist and Bayesian statistics can occur? Could it be that Bayes factors are indeed in a sense more sensitive to "outliers"?

Any input would be much appreciated. In case that's helpful, I've attached two .jasp files, each containing the frequentist and Bayesian ANOVAs either for all data (all.jasp) or for the dataset without this one subject (without.jasp).



  • That's interesting. I'll have to look at this a little later (some deadlines now, remind me if I haven't responded in a week), but some of the discrepancies may be due to violations of assumptions (homogeneity of variances). I'll take a look.

  • Hmm OK, that is a rather big effect of leaving out this single participant. Of course, you could also argue "how it is that the p-value only changes from .09 to .14 when I leave out this huge outlier in my relatively small data set?" but that's not constructive, and the change in the BF remains surprisingly large. To understand the problem better, perhaps you could break it down into a t-test. For a t-test, the t-value and sample size will determine the BF (hence the "Summary Stats module"). Do you find a similar discrepancy? (i.e., eliminate the noncrucial ANOVA variables and let's talk t-test instead)

  • Hi E.J.,

    Thanks so much for the super swift reply! I agree that you could be equally surprised about the behavior of the p-value---it was more the discrepancy that struck me.

    Interestingly, averaging over the other variables and just doing t-tests indeed leads to a completely different picture. The p-values are the same of course, but now the BFs in both cases are also modest and don't diverge much anymore:

    • All subjects: p = 0.091, BF10 = 0.795 (see all_t-test.jasp)
    • Without this one subject: p = 0.139, BF10 = 0.588 (see without_t-test.jasp)

    Still, there thus remains a discrepancy between frequentist and Bayesian tests: for the former it doesn't matter how you specify the model, but for the latter it matters a lot.

    I take it this is because the "Bayesian t-test" and "Bayesian ANOVA" are just descriptive names, but the underlying tests differ quite a lot? That is, the t-test/ANOVA are equivalent in the frequentist framework in the case of one factor with two levels, but this does not hold for the Bayesian analogs.

    Would you say this means that (main) effects in Bayesian ANOVAs should generally be followed up with Bayesian t-tests? Or are the results of Bayesian t-tests and ANOVAs generally similar, which would mean that my data are just weird / violate certain assumptions?


  • I'll need some time to digest this. Will get back to you later.

  • So I've tried reproducing this with simulated data, and I can't get it to work. See the following (change the seed for different data):

    N = 20
    effect = .2
    sigma.sub = 1
    sigma.err = .2
    y.mat = outer(rnorm(N,0,sigma.sub),c(-effect/2,effect/2), "+") + rnorm(2*N,0,sigma.err)
    y =
    summary(aov(Freq ~ Var2 + Error(Var1/Var2), dat = y))
    anovaBF(Freq ~ Var2 + Var1, whichRandom = "Var1", dat = y)
    ggplot(y, aes(y = Freq)) +
      geom_point(aes(x = Var2), size = 5) +
      geom_line(aes(x = Var2), group = y$Var1)
    # Modify data
    y0.mat = y.mat
    lowest = which.min(y0.mat[,1])
    y0.mat[lowest,2] = quantile(y0.mat[,2], p = .9)
    y0 =
    summary(aov(Freq ~ Var2 + Error(Var1/Var2), dat = y0))
    anovaBF(Freq ~ Var2 + Var1, whichRandom = "Var1", dat = y0)
    ggplot(y0, aes(y = Freq)) +
      geom_point(aes(x = Var2), size = 5) +
      geom_line(aes(x = Var2), group = y0$Var1)

    Notice that the Bayes factors seem to track the p value as one would expect.

  • Dear E.J. and Richard,

    Thank you so much for taking another look at my post. Indeed, Richard's simulation seems to behave differently, so it must be something particular about my data?

    I've included the data (aov_data.csv) and the code that gets me these results (I mainly use the BayesFactor package directly; I only posted the JASP results first because I thought that would be more convenient). Sorry about the length; I wanted it to be reproducible and was unable to make it shorter. In case it's still not reproducible: The p and BF values I obtained are also included as comments inbetween the code.

    Please let me know if there's any more information I can provide you with, and thanks again for all your help so far!


    #> Loading required package: coda
    #> Loading required package: Matrix
    #> ************
    #> Welcome to BayesFactor 0.9.12-2. If you have questions, please contact Richard Morey (
    #> Type BFManual() to open the manual.
    #> ************
    # Change this to match your filepath
    df <- read.csv("/Users/lcreteig/Desktop/aov_data.csv")
    df$session <- factor(df$session) # 2 levels: session A or B, (different brain stimulation)
    df$stimulus <- factor(df$stimulus) # 2 levels: on-screen target presented on LEFT or RIGHT
    df$block <- factor(df$block) # 3 different post-measurements
    # "latency" is the change in reaction time (ms) from baseline (so 0 is no change; baseline data not included)
    dfWithout <- df[df$subject != "S01", ] # make new data frame without "outlier" subject
    df$subject <- factor(df$subject) # original data has 26 subjects
    dfWithout$subject <- factor(dfWithout$subject) # new data has one less
    # All factors 
    # this matches the .jasp files in the 1st post
    ## Frequentist ANOVA
    ### all subjects
    freq <- aov(latency ~ session*stimulus*block + Error(subject/(session*stimulus*block)), dat = df)
    summary(freq)$'Error: subject:session'[[1]]$'Pr(>F)'
    #> [1] 0.09083114         NA
    ####without one subject
    freqWithout <- aov(latency ~ session*stimulus*block + Error(subject/(session*stimulus*block)), dat = dfWithout)
    summary(freqWithout)$'Error: subject:session'[[1]]$'Pr(>F)'
    #> [1] 0.1392397        NA
    ## Bayesian ANOVA
    ### all subjects
    bf <- anovaBF(latency ~ session*stimulus*block + subject, data = df, 
                  whichModels = "withmain", whichRandom = "subject") 
    #> [1] 60.51355
    #### without one subject
    bfWithout <- anovaBF(latency ~ session*stimulus*block + subject, data = dfWithout, 
                         whichModels = "withmain", whichRandom = "subject") 
    #> [1] 1.962064
    # Only SESSION
    # this matches the .jasp files in the 1st post
    # Average over all the other variables, except subject and session
    dfSession <- aggregate(.~subject+session, df, mean)
    dfSessionWithout <- aggregate(.~subject+session, dfWithout, mean)
    ## Frequentist ANOVA
    ### all subjects
    freqSession <- aov(latency ~ session + Error(subject/session), dat = dfSession)
    summary(freqSession)$'Error: subject:session'[[1]]$'Pr(>F)'
    #> [1] 0.09083114         NA
    #### without one subject
    freqSessionWithout <- aov(latency ~ session + Error(subject/session), dat = dfSessionWithout)
    summary(freqSessionWithout)$'Error: subject:session'[[1]]$'Pr(>F)'
    #> [1] 0.1392397        NA
    ## Bayesian ANOVA
    ### all subjects
    bfSession <- anovaBF(latency ~ session + subject, data = dfSession, 
                  whichModels = "withmain", whichRandom = "subject") 
    #> [1] 0.9600066
    #### without one subject
    bfSessionWithout <- anovaBF(latency ~ session + subject, data = dfSessionWithout, 
                         whichModels = "withmain", whichRandom = "subject") 
    #> [1] 0.680306

    Session info
    #> Session info --------------------------------------------------------------
    #>  setting  value                       
    #>  version  R version 3.4.0 (2017-04-21)
    #>  system   x86_64, darwin15.6.0        
    #>  ui       X11                         
    #>  language (EN)                        
    #>  collate  C                           
    #>  tz       Europe/Amsterdam            
    #>  date     2017-12-01
    #> Packages ------------------------------------------------------------------
    #>  package      * version  date       source         
    #>  BayesFactor  * 0.9.12-2 2015-09-19 CRAN (R 3.4.0) 
    #>  Matrix       * 1.2-9    2017-03-14 CRAN (R 3.4.0) 
    #>  MatrixModels   0.4-1    2015-08-22 CRAN (R 3.4.0) 
    #>  Rcpp           0.12.10  2017-03-19 cran (@0.12.10)
    #>  backports      1.0.5    2017-01-18 CRAN (R 3.4.0) 
    #>  coda         * 0.19-1   2016-12-08 CRAN (R 3.4.0) 
    #>  devtools       1.12.0   2016-12-05 CRAN (R 3.4.0) 
    #>  digest         0.6.12   2017-01-27 CRAN (R 3.4.0) 
    #>  evaluate       0.10     2016-10-11 CRAN (R 3.4.0) 
    #>  gtools         3.5.0    2015-05-29 CRAN (R 3.4.0) 
    #>  htmltools      0.3.6    2017-04-28 CRAN (R 3.4.0) 
    #>  knitr          1.15.1   2016-11-22 CRAN (R 3.4.0) 
    #>  lattice        0.20-35  2017-03-25 CRAN (R 3.4.0) 
    #>  magrittr       1.5      2014-11-22 cran (@1.5)    
    #>  memoise        1.1.0    2017-04-21 CRAN (R 3.4.0) 
    #>  mvtnorm        1.0-6    2017-03-02 CRAN (R 3.4.0) 
    #>  pbapply        1.3-2    2017-03-01 CRAN (R 3.4.0) 
    #>  rmarkdown      1.5      2017-04-26 CRAN (R 3.4.0) 
    #>  rprojroot      1.2      2017-01-16 CRAN (R 3.4.0) 
    #>  stringi        1.1.5    2017-04-07 cran (@1.1.5)  
    #>  stringr        1.2.0    2017-02-18 cran (@1.2.0)  
    #>  withr          1.0.2    2016-06-20 CRAN (R 3.4.0) 
    #>  yaml           2.1.14   2016-11-12 CRAN (R 3.4.0)

Sign In or Register to comment.