How to explain opposite effects of "frequentist outputs/Bayesian intervals" and BF hypothesis test?
Hi,
I've run a meta-analysis in JASP using both frequentist and Bayesian approaches. There are 27 observations, all of which are regression coefficients, and I have the 27 corresponding standard errors. The frequentist results are intercept=0.075 and is statistically significant (p=0.026), and the confidence interval is [0.010, 0.140]. For the Bayesian analysis, the random effects mean is 0.076 with credible interval of [0.019, 0.135]. So all these frequentist and Bayesian results seem consistent, and thus relatively easy to interpret as something like "there is very weak evidence in favour of there being an effect".
However, the BF hypothesis testing gives a BF10=0.632, which means anecdotal evidence in favour of the null hypothesis.
I understand that the effect, if any, is weak. But I would have expected the Bayesian anecdotal evidence to be consistent with the other results and thus slightly in favour of the alternative hypothesis.
I've uploaded the .jasp file to OSF, so all details can be found here: https://osf.io/s5ef8/files/osfstorage/669ad889a4786c000e365f22
Could you please help me explain/interpret why there is Bayesian BF hypothesis testing anecdotal evidence in favour of H0, while at the same time the frequentist results suggest to reject the null and the Bayesian 95% credible interval excludes zero and spans values consistent with the frequentist results?
Thank you,
Luke.
Comments
Hi Luke,
The difference occurs because of the Jeffreys-Lindley paradox (https://link.springer.com/article/10.1007/s00407-022-00298-3). The estimation approaches do not involve H0, but the BF does. See also BayesianSpectacles.org, the posts on the (lack of) evidence for p-values just lower than .05 (the "redefine statistical significance" series).
The prior distribution under H1 is driving these results. Does the default Cauchy truly reflect your belief about the effect sizes before seeing the data? Maybe you expected the direction? Or perhaps you expected the effects to be small? A positive-only N(0,sd=0.15) prior yields a BF10 of about 6.5. But of course you cannot make these decisions after seeing the data -- the default Cauchy prior reflects an alternative that says "the effects may be in either direction, and they may be huge"; the positive-only N(0,sd=0.15) prior reflects an alternative that says "the effects are positive, and they are probably tiny". So you have to think about what is a reasonable prior distribution (i.e., what question you wish to ask).
Cheers,
E.J.
Hi E.J.,
Thank you very much for the explanation. I’ve now read the paper you referenced and understand how this arose.
I have a follow-up question about how to pick a narrower H1 prior in a principled manner. For my current example, the p-value is p=0.026, and using the p_to_bf command from the bayestestR package, I get a BF10=2.478. Now if I go back to JASP and stick with the Cauchy prior, but change the scale from 0.707 to 0.142, I get pretty much an identical BF10=2.475. I realise this is all post-hoc and understand section “10.2 Objection 2” of your paper, such that if we were allowed to go back and narrow the prior after seeing the data, we could forever avoid the data supporting H0. What I’m trying to understand is how to pick a narrower prior, if there is a future scenario where I think the values should be small.
In JASP, the default is 0.707 (i.e. 2^(-1/2)). And I think other typical values are scale=1 (i.e. 2^0) and scale=1.4142 (i.e. 2^(1/2)). In your example of a narrow prior, you used a scale=0.15 for a normal distribution. If I take my example, and I don’t specify a direction, but I stick with Cauchy and reduce the scale to 0.15, I get a BF10=2.393.
In the future, if I do want to pre-specify a prior that reflects my expectation that the effects are small, are there any relatively default or established priors? Would a Cauchy with scale=0.15 be sensible or just look odd? Or was your selection of scale=0.15 arbitrary?
If the t-test is based on standardised effects, and Cauchy with scale=0.707 was selected to give standardised effects of 1 reasonable mass, are there established narrower scales if you want the mass to taper off after effect sizes of 0.5 or 0.2? What about Cauchy with scale of 2^(-2), based on the pattern of 2^x?
Many thanks,
Luke.
Hi Luke,
Well, the thing is this. If you want a more narrow prior, the BF will approach 1, as H1 will morph into H0 when the prior is infinitely peaked around 0, and you are comparing two identical hypotheses. So when I specify an informed prior, I tend to move the mode away from 0 as well. I personally like the Vohs prior, but ultimately you have to use your own judgment when it comes to each individual case.
There may be a way to salvage the analysis, if small positive values is really what is reasonable to expect: you can do a prior elicitation procedure -- without revealing the data, you can try to ask others about their expectations, and build a prior from that.
However, there will always be some ambiguity about the result when it depends on the prior distribution so much (and when it is not so clear what the prior ought to be).
Cheers,
E.J.
Hi EJ,
Thanks again for the response and suggestions. Just to double check, when you say "Vohs prior" are you referring to the one described here: https://doi.org/10.1177/0956797621989733? So this is an effect size = 0.30 and SD = 0.15. I can see from this paper that they describe the basis for this, which is prior research. Looks like a great method to adopt in the future. But do let me know if that is not the correct reference.
Best,
Luke.
Correct!
EJ