# neuroimaging data and BF

Hi,

I ran a few studies with functional near infrared spectroscopy (fNIRS) and while I think it could be a useful method, its still early days. Therefore the results in the literature are messy, everyone reports results in different ways (either reporting oxy, or deoxy signals, or both, or the difference between the two). There are also various options regarding preprocessing, which are widely discussed and no consensus reached. Lastly, as usually the whole fNIRS analyses spits out some beta values which are just analysed in t tests, for cognitive studies, there is the issue of correcting for multiple comparisons. Some people report uncorrected p values (as if corrected p values arent bad enough...), but that seems to be dying down. The problem with correction in fNIRS seems that its very conservative and eradicates everything or most. It's all a bit of a mess!

So I want to use BF to evaluate all of these issues. I can run analyses comparing the different preprocessing ways, analysing the various signals, and getting beta values for all. I do of course get p values too, just as thats what we (sadly) still are expected to do, and in some regards they might be useful in guiding my BF analyses, keeping in mind they are uncorrected p values...?

I'm a JASP (novice/) convert, and I think here it would provide a useful tool in evaluating the evidence from fNIRS data. I'm not sure my supervisor nor reviewers will be okay with this, but I want to try to make the argument. Do you think that makes sense?

Thanks!

## Comments

Hi metamorphose42,

Yes, you can use BFs. Opinions differ on what to make of this in case of multiple comparisons. I personally am in the Jeffreys, Scott & Berger camp, who argue that you ought to adjust your prior odds. Basically, they argue, in case you are fishing for signal, you probably expect most of the results to return noise. So you can either set the prior odds to a specific number (or proportion of tests conducted), or put a prior on it. There is another camp of people who feel that the evidence is just the evidence, and that the prior odds are a function solely of relative plausibility. That may be so, but with 80,000 tests that would be difficult to determine. Bottom-line: this is very useful but not so simple.

Cheers,

E.J.

Thanks so much. I will certainly read up on this further, but I already tend towards adjusting the prior.

One more question, as I lost a lot of data, from 25 participants to as few as df=9 for some t tests, but still get a very strong BF, but BF isnt independent of sample size though? And I faintly remember from the workshop that there was a way to explore sample size issues, was it sequential analyses?

Many many thanks!

BF simply quantities the evidence.

On average, more participants means more evidence, but it is possible to obtain decisive evidence with few observations.The sequential analysis shows the evidential flow as the sample grows.

E.J.

Thanks so much, so very helpful!

Hi EJ,

I too am writing a paper that is in an area almost untouched by Bayesian stats, "second language acquisition". Often you see studies in this area which are underpowered and have uncorrected multiple comparisons. Nonetheless, studies have no problems making very definitive conclusions about the comparisons with p-values less than 0.05. I mainly want to use Bayes factors for a more appropriate and reflective measure of the strength of evidence for particular hypotheses. Now, I myself am going to make multiple comparisons from a relatively small dataset (n = 40, over 2-time points, analysed on about 10 separate repeated measures ANOVAs, as can't do Bayesian MANOVA in JASP yet).

I would like to provide a discussion of correction from multiple hypotheses. You mention that "There is another camp of people who feel that the evidence is just the evidence" Would it be possible to point me in the direction of that literature?

Best,

Gareth.

Hi Gareth,

This is consistent with

subjectiveBayesianism. I know Dennis Lindley was of this opinion, for instance, but perhaps it is also mentioned in Edwards, Lindman, & Savage (1963). The idea is that as long as you have specified probabilities for the hypotheses you are planning to test, and they really reflect your belief, then Bayes' rule simply updates that knowledge, and it does not matter how many other hypotheses are in the mix.So in terms of fMRI, suppose researcher A measures 80,000 voxels and believes that each has a 50-50 shot of being active. Researcher B only measures 1 voxel and believes that it has a 50-50 shot of being active. Now suppose the overlapping voxel shows the same data; according to a subjective Bayesian, the inference for this voxel ought to be the same, regardless of the fact that A also measured 79,999 other voxels.

Of course, multiple comparisons often signal a lack of prior conviction. Anyhow, the subjective opinion may also have been discussed briefly in the work on objective Bayesian solutions (e.g., Scott & Berger, 2006, 2010).

Cheers,

E.J.

in my search to find how BF and correction for multiple testing are related I found above posts.

I determined Bayesian factors for a group comparison on seven variables, using: H1: delta ~ Cauchy(width=0.707).

Furthermore, I estimated BFs for 7 correlations, using: Stretched beta prior width 1.

Can I make the argument not to correct p-values for multiple testing by providing the BFs to inform the reader about the likelihood of the presence or absence of an effect? I was thinking along the following lines:

"Importantly, when a frequentist approach (e.g., t-test) indicates the absence of an effect (e.g., p>0.05), the Bayesian approach can give additional information by assessing the likelihood of this absence. A Bayes factor (BF01) higher than 1 indicates that the data is more likely under the null-hypothesis than under the alternative hypothesis. For example, a BF01 of 1.5 means that the data are 1.5 times more likely to occur under the null hypothesis compared to the alternative hypothesis. A high BF01 (>3), favors the hypothesis that there is no effect, while a low BF01 (<0.33, which is equal to BF10 (i.e., 1/BF01)>3) favors the hypothesis that there is an effect. A BF01 between 0.33 and 3 indicates that the evidence is inconclusive.

A correction for multiple testing becomes redundant when using Bayesian statistics. Frequentist approaches, e.g., t-tests, use significance levels to draw conclusions about accepting or rejecting a hypothesis. These approaches need to take care of an increase of type I errors by using a correction for multiple comparisons. Conversely, the Bayesian approach allows for assessing the likelihood of a hypothesis. Therefore, type I errors do not exist with this approach and a correction for multiple comparison is not needed."

thanks

NvH

Hi NvH,

Yes, this has been argued, mostly by subjective Bayesians such as Lindley. The Bayesian "correction" for multiplicity is in the prior model probability. If you are testing 10 effects, do you really believe that every single test is plausible, just as plausible as if you had set out to test a single one? You are right that the BF is not affected by multiplicity. So subjective Bayesians will argue that no correction is needed (because you base your plausibility assessment on prior knowledge). Objective Bayesians such as Jeffreys and Berger believe that the prior model probability needs to be adjusted. The papers by Scott and Berger on linear regression are informative in this respect. It is an interesting issue.

Cheers,

E.J.

Hmm I noticed I basically re-entered my earlier answer. Well, goes to show I didn't change my mind about this.

E.J.

Dear JASPers,

I just read the paper by Keysers, Gazzola, and Wagenmakers (NN, 2020), which I had hoped would shed some more light on the issue of multiple comparisons in the context of the use of Bayes factors. Unfortunately, the article only briefly brings up this topic saying that "correction for multiple comparisons [...] are still in their infancy for the Bayes factor" (page 798). At least as far as neuroimaging is concerned, I could imagine that this is one of the primary reasons why BFs have not caught on yet in neuroscience as would be desirable. Probably at least one of the reasons that have put me off ;)

After reading this thread I was curiuos how exactly one would "adjust the prior odds"? They are supposed to reflect my prior belief about how likely each voxel (to stick to that example) is to show an effect. If I had only one voxel, a reasonable prior would be 50:50, right? But what would it be for two voxels? 75:25 each? For N voxels, would they be (100 - 50 / N):(50 / N)? This should express the prior belief that on average I expect one out of my N voxels to show an effect. Would this be a reasonable prior?

I am a little confused about this approach still. I do not understand why adjusting the prior odds changes my BFs. I thought one of the appealing properties of using BFs is that they express by what factor the data that we have observed changes these odds to yield the posterior odds such that, even if two scientists disagree on what the prior odds should be, there should be agreement about by how much these odds have been changed by the data. So why should prior odds adjustment thus have any effect on my BF map of brain activity? What am I missing here? Instead of BF maps, would it not make more sense to look at posterior odds maps instead in this case? (I noticed for instance that in the multiple comparisons correction for Bayesian rmANOVA, this is also done based on posterior odds)

Relatedly (I think), another Bayesian method might lend itself more easily to multiple comparisons problems. Bayesian parameter estimation could be used to calculate posterior probability maps (therefore related) of a contrast value being larger than 0 (i.e. H_0: \beta <= 0) and these maps could be thresholded at some value like 95 %, thus roughly controlling the false discovery rate at q = 5 % (Friston and Penny, NeuroImage, 2003, page 1246). (From what I recall from the Lego workshop though, EJ and Michael did not seem particularly fond of Bayesian parameter estimation, I believe). So given this convenience, what reasons are there not to proceed in this way?

Thanks and best,

Michael

Hi Michael,

Thanks for your post. My answers, briefly:

The Theory of Statistics in Psychology -- Applications, Use and Misunderstandings. New York: Springer. (https://psyarxiv.com/rqnu5)Cheers,

E.J.

Thanks for your comment. The Stephens & Balding paper was great! It clarified the multiple comparisons correction through prior odds for me. It also brought up the connection between the posterior probability of association (PPA), which basically is the H_1 for genetic association studies, and the false discovery rate correction that I mentioned. So there is no need for Bayesian estimation.

The MSc thesis was somewhat harder to read and I think it does not specifically deal with multiple comparisons in mass statistical analyses like genetics or fMRI. It is more about controlling tests for comparisons between parameters of a given (linear) model. There, the hypothesis space can be reduced after rejecting a null due to logical dependencies between hypotheses -- in fMRI, H_1 can in principle be true in any combination of voxels. It might still be relevant for pairwise comparisons between ROIs for instance where inequality statements may show transitivity for instance: A < B ^ B < C => A < C.

Cheers,

Michael