How to avoid (large) rounding errors in my inclusionBF() function?
Hi,
I want to write a function that lets me compute inclusion BFs in R. I'd like to stick to R(Markdown) for my analysis pipeline because then everything's in one place. While writing the function, I compared the function's output with what JASP reported as inclusion BF. My values were very far off and I couldn't quite figure out why.
After a short exchange on Twitter yesterday, I now think that my function "works" but that the numbers are not identical to what JASP reports because of (1) noise in the MCMC sampling (which is not an issue) and (2) rounding. I wonder whether someone can recommend a way to avoid the rounding issues.
Here is the function I wrote:
inclusionBF <- function(obj, var) {
# Input:
# obj: an object of class 'BFBayesFactor'
# var: a string containing the name of the variable for which the inclusionBF should be calculated
# Output:
# the inclusion Bayes factor for `var`
bfs <- extractBF(obj)
idx <- grepl(var, rownames(bfs))
if(all(!idx)) stop(paste("The variable", var, "does not appear in any model. Inclusion BF cannot be calculated."))
# prior inclusion probability (assuming that all models are equally likely)
prior <- sum(idx) * ( 1/nrow(bfs) )
# posterior inclusion probability: sum post. prob. of models that incl. `var`:
post <- sum(bfs$bf[idx]) / sum(bfs$bf)
if(post == 1) warning("The posterior inclusion probability is so high that R rounds it to 1. The inclusion BF will be Inf.")
# "The inclusion Bayes factor is the ratio of the posterior inclusion probability
# versus the non-inclusion probability and the prior inclusion probability versus
# the non-inclusion probability."
inclBF <- (post / (1-post)) / (prior / (1-prior))
return(inclBF)
}
And here are two examples using it. One with a dataset (npk) that will produce relatively small BFs so that my function's output matches JASP's output quite closely and one (ToothGrowth) for which the differences are pretty large.
Example 1: npk
full <- lmBF(yield ~ N + P + N:P, data=npk)
null <- (1/full) / (1/full)
noInteraction <- lmBF(yield ~ N + P, data=npk)
onlyN <- lmBF(yield ~ N, data=npk)
onlyP <- lmBF(yield ~ P, data=npk)
allBFs <- c(full, noInteraction, onlyN, onlyP, null)
# For the main effects:
inclusionBF(allBFs, "N")
inclusionBF(allBFs, "P")
# For the interaction:
inclusionBF(allBFs, "N:P")
Gives me this:
> # For the main effects:
> inclusionBF(allBFs, "N")
[1] 2.269597
> inclusionBF(allBFs, "P")
[1] 0.3897666
> # For the interaction:
> inclusionBF(allBFs, "N:P")
[1] 0.5002526
Compare with: screenshot of JASP output
Example 2: ToothGrowth
full <- lmBF(len ~ supp + dose + supp:dose, data=ToothGrowth)
null <- (1/full) / (1/full)
noInteraction <- lmBF(len ~ supp + dose, data=ToothGrowth)
onlyDose <- lmBF(len ~ dose, data=ToothGrowth)
onlySupp <- lmBF(len ~ supp, data=ToothGrowth)
allBFs <- c(full, noInteraction, onlyDose, onlySupp, null)
# For the main effects:
inclusionBF(allBFs, "dose")
inclusionBF(allBFs, "supp")
# For the interaction:
inclusionBF(allBFs, "supp:dose")
Gives me:
> # For the main effects:
> inclusionBF(allBFs, "dose")
[1] 5.740726e+12
> inclusionBF(allBFs, "supp")
[1] 34.95848
> # For the interaction:
> inclusionBF(allBFs, "supp:dose")
[1] 4.953406
Compare with: screenshot of JASP output.
Thank you for your time. Any help would be greatly appreciated!
- Florian
Comments
Hi Florian,
I think the discrepancy for the ToothGrowth data set occurs because JASP automatically converts the fixed factors to the R type "factor" whereas in R, you need to do this manually:
ToothGrowth[["dose"]] <- as.factor(ToothGrowth[["dose"]]).
Using your function, this yields for me:
[1] 3.160421e+14
[1] 141.9707
[1] 10.43514
Cheers,
Quentin
Hi Quentin,
thanks a lot for your quick and helpful reply! I was so focused on making sure I understand how to compute the inclusion BF that this escaped me.
Thanks for taking the time!