Large Bayes Factor changes with exclusion of single subject (Bayesian ANOVA)

Leon · November 2017

Hi all,

I have run both frequentist and Bayesian 2x2x3 repeated measures ANOVA's for my analysis of some reaction time data. At one point I noticed an interesting discrepancy between the two: in the frequentist analysis, a main effect (of SESSION) was not significant (p = 0.09), but it received fairly strong support in the Bayesian analysis (BF10 of only the model: 56, inclusion BF: 40).

This struck me as odd: I have had only encountered the opposite situation before (significant p-value, BF in favor of null). So I decided to look at the individual subject data (for a plot, see data.png file). It turns out there is one subject out of 26 that showed a much stronger effect than everyone else (one of the orange lines in the figure). I have no reason to exclude this subject, but because I'm still learning about Bayesian statistics I decided to try and see what would happen if I reran the analyses without this subject.

To my surprise, while the p-value changed only moderately (from 0.09 to 0.14), the Bayes factor plummeted from 57 to around 2. So by excluding just one subject I have gone from "strong" to "anecdotal" evidence for H1.

I don't really have a good intuition for Bayes factors yet, but this large change seems quite odd, especially compared to the frequentist analysis. Also, I am really in doubt how to interpret and report this effect now, if at all.

I'm left wondering under what circumstances such discrepancies between frequentist and Bayesian statistics can occur? Could it be that Bayes factors are indeed in a sense more sensitive to "outliers"?

Any input would be much appreciated. In case that's helpful, I've attached two .jasp files, each containing the frequentist and Bayesian ANOVAs either for all data (all.jasp) or for the dataset without this one subject (without.jasp).

-Leon

EJ · November 2017

That's interesting. I'll have to look at this a little later (some deadlines now, remind me if I haven't responded in a week), but some of the discrepancies may be due to violations of assumptions (homogeneity of variances). I'll take a look.

EJ · November 2017

Hmm OK, that is a rather big effect of leaving out this single participant. Of course, you could also argue "how it is that the p-value only changes from .09 to .14 when I leave out this huge outlier in my relatively small data set?" but that's not constructive, and the change in the BF remains surprisingly large. To understand the problem better, perhaps you could break it down into a t-test. For a t-test, the t-value and sample size will determine the BF (hence the "Summary Stats module"). Do you find a similar discrepancy? (i.e., eliminate the noncrucial ANOVA variables and let's talk t-test instead)
E.J.

Leon · November 2017

Hi E.J.,

Thanks so much for the super swift reply! I agree that you could be equally surprised about the behavior of the p-value---it was more the discrepancy that struck me.

Interestingly, averaging over the other variables and just doing t-tests indeed leads to a completely different picture. The p-values are the same of course, but now the BFs in both cases are also modest and don't diverge much anymore:

All subjects: p = 0.091, BF10 = 0.795 (see all_t-test.jasp)
Without this one subject: p = 0.139, BF10 = 0.588 (see without_t-test.jasp)

Still, there thus remains a discrepancy between frequentist and Bayesian tests: for the former it doesn't matter how you specify the model, but for the latter it matters a lot.

I take it this is because the "Bayesian t-test" and "Bayesian ANOVA" are just descriptive names, but the underlying tests differ quite a lot? That is, the t-test/ANOVA are equivalent in the frequentist framework in the case of one factor with two levels, but this does not hold for the Bayesian analogs.

Would you say this means that (main) effects in Bayesian ANOVAs should generally be followed up with Bayesian t-tests? Or are the results of Bayesian t-tests and ANOVAs generally similar, which would mean that my data are just weird / violate certain assumptions?

-Leon

EJ · December 2017

I'll need some time to digest this. Will get back to you later.

richarddmorey · December 2017

So I've tried reproducing this with simulated data, and I can't get it to work. See the following (change the seed for different data):

library(BayesFactor)
library(ggplot2)

set.seed(8)
N = 20
effect = .2
sigma.sub = 1
sigma.err = .2
y.mat = outer(rnorm(N,0,sigma.sub),c(-effect/2,effect/2), "+") + rnorm(2*N,0,sigma.err)

y = as.data.frame.table(y.mat)
summary(aov(Freq ~ Var2 + Error(Var1/Var2), dat = y))

anovaBF(Freq ~ Var2 + Var1, whichRandom = "Var1", dat = y)

ggplot(y, aes(y = Freq)) +
  geom_point(aes(x = Var2), size = 5) +
  geom_line(aes(x = Var2), group = y$Var1)

# Modify data

y0.mat = y.mat
lowest = which.min(y0.mat[,1])
y0.mat[lowest,2] = quantile(y0.mat[,2], p = .9)
y0 = as.data.frame.table(y0.mat)

summary(aov(Freq ~ Var2 + Error(Var1/Var2), dat = y0))

anovaBF(Freq ~ Var2 + Var1, whichRandom = "Var1", dat = y0)

ggplot(y0, aes(y = Freq)) +
  geom_point(aes(x = Var2), size = 5) +
  geom_line(aes(x = Var2), group = y0$Var1)

Notice that the Bayes factors seem to track the p value as one would expect.

Leon · December 2017

Dear E.J. and Richard,

Thank you so much for taking another look at my post. Indeed, Richard's simulation seems to behave differently, so it must be something particular about my data?

I've included the data (aov_data.csv) and the code that gets me these results (I mainly use the BayesFactor package directly; I only posted the JASP results first because I thought that would be more convenient). Sorry about the length; I wanted it to be reproducible and was unable to make it shorter. In case it's still not reproducible: The p and BF values I obtained are also included as comments inbetween the code.

Please let me know if there's any more information I can provide you with, and thanks again for all your help so far!

-Leon

library(BayesFactor)
#> Loading required package: coda
#> Loading required package: Matrix
#> ************
#> Welcome to BayesFactor 0.9.12-2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
#> 
#> Type BFManual() to open the manual.
#> ************

# Change this to match your filepath
df <- read.csv("/Users/lcreteig/Desktop/aov_data.csv")

df$session <- factor(df$session) # 2 levels: session A or B, (different brain stimulation)
df$stimulus <- factor(df$stimulus) # 2 levels: on-screen target presented on LEFT or RIGHT
df$block <- factor(df$block) # 3 different post-measurements
# "latency" is the change in reaction time (ms) from baseline (so 0 is no change; baseline data not included)

dfWithout <- df[df$subject != "S01", ] # make new data frame without "outlier" subject
df$subject <- factor(df$subject) # original data has 26 subjects
dfWithout$subject <- factor(dfWithout$subject) # new data has one less

# All factors 
# this matches the .jasp files in the 1st post

## Frequentist ANOVA

### all subjects
freq <- aov(latency ~ session*stimulus*block + Error(subject/(session*stimulus*block)), dat = df)
summary(freq)$'Error: subject:session'[[1]]$'Pr(>F)'
#> [1] 0.09083114         NA

####without one subject
freqWithout <- aov(latency ~ session*stimulus*block + Error(subject/(session*stimulus*block)), dat = dfWithout)
summary(freqWithout)$'Error: subject:session'[[1]]$'Pr(>F)'
#> [1] 0.1392397        NA

## Bayesian ANOVA

### all subjects
bf <- anovaBF(latency ~ session*stimulus*block + subject, data = df, 
              whichModels = "withmain", whichRandom = "subject") 
extractBF(bf)$bf[1]
#> [1] 60.51355

#### without one subject
bfWithout <- anovaBF(latency ~ session*stimulus*block + subject, data = dfWithout, 
                     whichModels = "withmain", whichRandom = "subject") 
extractBF(bfWithout)$bf[1]
#> [1] 1.962064

# Only SESSION
# this matches the .jasp files in the 1st post

# Average over all the other variables, except subject and session
dfSession <- aggregate(.~subject+session, df, mean)
dfSessionWithout <- aggregate(.~subject+session, dfWithout, mean)

## Frequentist ANOVA

### all subjects
freqSession <- aov(latency ~ session + Error(subject/session), dat = dfSession)
summary(freqSession)$'Error: subject:session'[[1]]$'Pr(>F)'
#> [1] 0.09083114         NA

#### without one subject
freqSessionWithout <- aov(latency ~ session + Error(subject/session), dat = dfSessionWithout)
summary(freqSessionWithout)$'Error: subject:session'[[1]]$'Pr(>F)'
#> [1] 0.1392397        NA

## Bayesian ANOVA

### all subjects
bfSession <- anovaBF(latency ~ session + subject, data = dfSession, 
              whichModels = "withmain", whichRandom = "subject") 
extractBF(bfSession)$bf[1]
#> [1] 0.9600066

#### without one subject
bfSessionWithout <- anovaBF(latency ~ session + subject, data = dfSessionWithout, 
                     whichModels = "withmain", whichRandom = "subject") 
extractBF(bfSessionWithout)$bf[1]
#> [1] 0.680306

Session info

devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.0 (2017-04-21)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  C                           
#>  tz       Europe/Amsterdam            
#>  date     2017-12-01
#> Packages ------------------------------------------------------------------
#>  package      * version  date       source         
#>  BayesFactor  * 0.9.12-2 2015-09-19 CRAN (R 3.4.0) 
#>  Matrix       * 1.2-9    2017-03-14 CRAN (R 3.4.0) 
#>  MatrixModels   0.4-1    2015-08-22 CRAN (R 3.4.0) 
#>  Rcpp           0.12.10  2017-03-19 cran (@0.12.10)
#>  backports      1.0.5    2017-01-18 CRAN (R 3.4.0) 
#>  coda         * 0.19-1   2016-12-08 CRAN (R 3.4.0) 
#>  devtools       1.12.0   2016-12-05 CRAN (R 3.4.0) 
#>  digest         0.6.12   2017-01-27 CRAN (R 3.4.0) 
#>  evaluate       0.10     2016-10-11 CRAN (R 3.4.0) 
#>  gtools         3.5.0    2015-05-29 CRAN (R 3.4.0) 
#>  htmltools      0.3.6    2017-04-28 CRAN (R 3.4.0) 
#>  knitr          1.15.1   2016-11-22 CRAN (R 3.4.0) 
#>  lattice        0.20-35  2017-03-25 CRAN (R 3.4.0) 
#>  magrittr       1.5      2014-11-22 cran (@1.5)    
#>  memoise        1.1.0    2017-04-21 CRAN (R 3.4.0) 
#>  mvtnorm        1.0-6    2017-03-02 CRAN (R 3.4.0) 
#>  pbapply        1.3-2    2017-03-01 CRAN (R 3.4.0) 
#>  rmarkdown      1.5      2017-04-26 CRAN (R 3.4.0) 
#>  rprojroot      1.2      2017-01-16 CRAN (R 3.4.0) 
#>  stringi        1.1.5    2017-04-07 cran (@1.1.5)  
#>  stringr        1.2.0    2017-02-18 cran (@1.2.0)  
#>  withr          1.0.2    2016-06-20 CRAN (R 3.4.0) 
#>  yaml           2.1.14   2016-11-12 CRAN (R 3.4.0)

Howdy, Stranger!

Categories

Large Bayes Factor changes with exclusion of single subject (Bayesian ANOVA)

Comments

Howdy, Stranger!

Quick Links

Categories

Large Bayes Factor changes with exclusion of single subject (Bayesian ANOVA)

Comments