# BayesFactor regressionBF function generates oddly binned small BFs

Dear all,

I'm using the BayesFactor package in R for a Neuroimaging analysis. I repeatedly compute the Bayes factor for a simple regression design with 'regressionBF' of a binary imaging variable on a continuous variable across a brain image space. (I know that a Bayesian t-test would be the first choice with a binary independent variable, but I'd like to keep the code as flexible as NHST equivalents that use GLMs instead of t-tests).

At first, everything looked fine; the BFs were computed across the whole brain image and spanned across the range from small to very large. Only later I realised that there were no really small BFs <1/10 at all. In one analysis (=repeated application of BF regression over the imaging space), the smallest BF I found was 0.2826627, the second smallest 0.2826627 (=the same except for changes in a decimal place not shown), the third 0.2826627, the fifteenth 0.2826628 (note the change in the shown decimal place) and so on. Basically, small BFs were binned into small groups with very minor changes to decimal places. On the other hand, positive BFs were not binned and looked very reasonable, going up to 6*10^14

Does anybody have an idea what happened to my data?

Is this something to be generally expected with such regression approach, maybe because of the way priors are chosen? Can I change something to get reliable small BFs to assess h0?

Is the Bayes Factor regression maybe invalid with binary independent variables? I am quite naive here, I thought that I could just do the same as I've done before with NHST statistics where a t-test can be formulated as a GLM with a binary predictor.

## Comments

A small update: I repeated an entire analysis - i.e. a BF factor mapping over 700.000 elements of my brain imaging space - with the two sample t-test with BF from the BayesFactor package. The odd binning of BFs is not as pronounced as before, but I still have doubts.

The BFs again go up to 5.5*10^14 and look quite reasonable. On the other side, small BFs again only go down to 0.28, but they show much more variation (1st: 0.2828894, 2nd: 0.2843553, 3rd: 0.2856553). Hence it is not as oddly binned as before, but, still, there seems to be a hard limit. I really doubt that, across the entire brain image of 700.000 elements, there is no element where h0 is very likely. I would have expected at least some BFs < 1/10.

The groups of the t-test were uneven and the data were not normally distributed. Could this have cause the finding?

Best,

Chris

I think I "solved" the problem myself - BFs in the t-test/Regressions simply don't become all too small in sample sizes of 50-300 subjects.

I checked on this with the BF-t-test on random data that I picked from the same normal distribution. With 10.000 repetitions, I compared two samples of 100 subjects each. The average BF was only 0.43, and the lowest was 0.15.Well, given that the conventions to evaluate BFs are often symmetrical for h0 and h1 (like 3/10/100 for moderate/strong/extreme evidence for h1 and 1/3 / 1/10 / 1/100 for moderate/strong/extreme evidence for h0), I thought that I also should find very small values in favour of h0. But no, that's obviously not how things are working.

The binning that I described in the first post might well originate from the awkward features of the clinical data that I work with.

Hi CSperber,

As you discovered, it is much easier to find evidence for the presence of an effect than for its absence. Basically, H0:delta=0 can win the competition

onlybecause it is more parsimoneous than H1 -- but there are values of delta under H1 that can handle the data well. So for finite data there is a limit to the evidence for H0 (i.e., when t=0 in the sample). On the other hand, if the data do show an effect than H0 utterly fails. As we mention in https://psyarxiv.com/nf7rp/,"In general, the claim that something is absent is more difficult to support than the claim that something is present, at least when one is uncertain about the size of the phenomenon that is present. Consider, for instance, the null hypothesis “There is no animal in this room”, tested against the alternative hypothesis: “There is an animal in this room, but it could be as small as an ant or as big as a cow”. Now if the “effect” is of medium size (say a cat), it can be quickly discovered and H1 then receives decisive support. But if a cursory inspection does not reveal any animal, then support for H0 will only be weak (after all, it is easy to miss an ant). Now there is a way to collect strong evidence for H0, but it requires more effort – a systematic search with a magnifying glass, for instance. So instead of being problematic, the asymmetry in the rate of increase in evidence is desirable, in line with common sense, and indeed a direct mathematical consequence of how the competing models were constructed."

E.J.

Dear EJ,

thanks for the answer, the very intuitive example and the reference! This was really helpful!

One more thought about the point in your manuscript: you disagree with TK that the asymmetry in the speed in which NHBT gathers evidence is a disadvantage - "Bayes factors do quantify evidence, either for H0 or for H1, but they do not need to do this at an equal rate, nor is it clear why this would be at all desirable."

Well, evidence for H1 can often be gathered quickly, and it would be desirable if we could do so as well for H0, no matter if it's symmetric or not. But I agree, it is what it is, and that is no disadvantage, especially given that we never had any evidence for H0 at all before with NHST.

But I see another huge advantage of NHBT. The eureka-moment when I understood that it makes sense that evidence for H1 and H0 is asymmetric was when I thought about statistical power. For years now, we became more and more aware that many of our studies are painfully underpowered. And, of course, when our data are hardly suited to prove H1 with NHST, then NHBT will also not generate strong evidence for H0 when evidence for H1 is absent. Now, here's the point: with a large number of tests, like in brain mapping, we can directly see how painfully underpowered our studies are. When the smallest Bayes factors across the whole data space (like the entire brain image) are not really small and provide little to no evidence for H0, we know that the study is underpowered. We don't even need to do any quirky secondary analyses like mapping statistical power across the same data space.

I think that, in specific situations like brain mapping, this aspect of NHBT is beautifully transparent and raises more awareness of limitations imposed by our data.

Best,

Chris

Hi Chris,

I think you are spot-on! It is easy to specify the data that will yield the maximum evidence for H0 (t=0, n/s = 1/2, etc.) One would still need to specify an alternative, but the default choices would be fine here, because they tend to be relatively complex, and making the H1 more informative will not help H0 (unless you don't center it on 0, which is when anything goes). With t=0 and a default specification we therefore have a reasonable assessment of what we can expect.

I think I mention the point somewhere here: https://psyarxiv.com/egydq . In my "EZ-BayesFactor", when p=1 then BF_01 = sqrt(n) [for models in which the se of the MLE goes as sigma/sqrt(n) and using the unit-information prior].

Cheers,

E.J.