"Posterior predictive" interpretation of BF versus p value
Dear Bayesianists,
Like a few others, I have high hopes that Bayesian statistics could make an important contribution to improving replicability in science. In this context, the following question has crossed my mind:
You may know the seminal paper by Aarts et al., which showed that only 39% of the investigated effects from the literature could not be replicated. Replication success was defined as whether or not the replication attempt yielded a p value below .05. The 39% success rate is a discouragingly small number. On the other hand, one should consider that once I have obtained a p value that is just below the magic threshold of .05, the probability of obtaining a significant result by this definition the next time I conduct the experiment is just 50%! So after seeing the data, I can apparently only expect an exact replication of my experiment to be successfully replicated with a disappointingly low probability of 50%. So, the "posterior predictive" (sorry for highjacking the term) interpretability of a p value is quite poor. How would a BF fare in this respect in comparison? When I have a BF of, say, 3 in favor of H1, is my best estimate for the BF in an exact replication also a BF of 3? I somehow feel that it would be but cannot explain to myself why it should. BFs are not magic after all ;)
(In my understanding, one of the most important take-aways from the Aarts et al. paper is that the low success rates rather reflect a systematic overestimation of effect sizes. So, actual power in the replication attempts is substantially smaller than nominal power. See also, the "winner's curse".)
Cheers,
Michael
 
				 
							 
							
Comments
Let's say you get a BF10 = 3. Under equal prior probabilities, this means a posterior probability of 75% for H1 vs H0. In a simulation for what you can expect from a replication, you would first sample either H1 (with probability .75) or H0 (with probability .25), then draw a parameter value from the relevant posterior distribution (which is the prior distribution for the replication study), generate data from that, and compute the Bayes factors and resultant posterior probabilities. Schad et al. (2020, Eq 5) show that this recovers the 75%. So the prior model probabilities equal the expected posterior probabilities.
@ARTICLE{SchadEtAlinpress,
AUTHOR = {Schad, D. J. and Nicenboim, B. and B\"{u}rkner, P.--C. and Betancourt, M. and Vasishth, S.},
TITLE = {Workflow Techniques for the Robust Use of {B}ayes Factors},
JOURNAL = {Psychological Methods},
YEAR = {{in press}},
}
Cheers,
E.J.