Creating an informed prior from Bayesian reanalysis of two frequentist publications

Luke_Baxter · September 2024

Hi,

I'm tying to figure out how to make an informed prior from pre-existing knowledge in the literature. My main analysis is a Bayesian independent samples t-test. I've currently run this using a default uninformed prior (Cauchy with scale = 0.707). I've also searched the literature for research on the same question and found two papers. However, they are both frequentist analyses with sparsely reported results, but I was able to calculate t-statistics for the results I need. I then used JASP to do a Bayesian reanalysis for these two papers using their t-stats and sample sizes. The results are uploaded to OSF here: https://osf.io/qgcw7.

The first paper's reanalysis is called: S.S. Bayesian Paired Samples T-Test: J et al.

The second paper's reanalysis is called: S.S. Bayesian Independent Samples T-Test: S et al.

I wanted to combine the information in both these papers to get an informed prior for my analysis. I've been reading the "Replication Bayes factors from evidence updating" here: https://link.springer.com/article/10.3758/s13428-018-1092-x. But I can't seem to figure it out. I don't think I have enough information from the original publications to be able to "compute the overall t value for the combined data", as was done in Appendix A of this paper. So I'm not sure if I can use this Bayes Factor approach.

I've turned to the “today’s posterior is tomorrow’s prior” approach. The paper says "the posterior for δ in a t test has no known distributional form", but that you can "approximate the posterior on effect size obtained from the t test with a normal distribution; this normal distribution is then used as a prior for the analysis of the replication experiment". Assuming the 95% credible interval could take the place of a 95% confidence interval, I used the posterior 95% interval and the original sample size to calculate the standard deviation of the (normal approximation of) posterior distribution, and I used the median of the posterior as the mean. So taking the mean and standard deviation to approximate the posterior, I used this mean and standard deviation as an informed prior with a normal distribution.

The results are uploaded to OSF here: https://osf.io/qgcw7. The analysis called "S informed by J" is the same analysis as "S.S. Bayesian Independent Samples T-Test: S et al.", but instead of using the default Cauchy prior, I used the informed prior based on the posterior from "S.S. Bayesian Paired Samples T-Test: J et al". And vice versa for the analysis called "J informed by S".

I expected the posterior to be the same for "S informed by J" and "J informed by S", as the order shouldn't matter (I thought), but they have different posteriors. And ultimately, I just want to take the posterior of the combined result forward to my main analysis to use as an informed prior. But given my "S informed by J" and "J informed by S" don't match, I feel I've gone very wrong somewhere.

Could you please advise on how to combine the results from two Bayesian reanalyses to get an overall posterior? And how to take forward the combined posterior result to be an informed prior?

Many thanks,

Luke.

EJ · October 2024

If you have multiple experiments, one option is to do a meta-analysis and base the prior on the group-level distribution for effect sizes. Also, the specific form of the prior should not matter much. I would be tempted to specify a (positive-only? There is probably a strong directional prediction) normal distribution with a mean equal to the average of the two studies, and a standard deviation that is fairly wide. The main source of uncertainty here will probably be in the study random effects (i.e., different studies having different underlying true effect sizes). What is the mean of the posterior distributions from the two studies?

Sorry about the tardy reply

EJ

Luke_Baxter · October 2024

Hi EJ,

Thank you very much for the reply. Apologies for my lengthy response. I am trying to understand the theory more than anything, so I have laid out some scenarios below. This is primarily for understanding the logic, so it is not urgent. I attended the 2-day JASP course this year, and have been on a mini-mission to convert my research group at the moment, but have not understood fully the selection of priors topic. If you do have some time to reply at any point, I would love to get your insights, as I think this is a central topic about Bayesian analysis that I haven't yet grasped.

I have run the meta-analysis on the two studies, the results are here: https://osf.io/3hbu9. The mean of the model averaged posterior is 0.128 (the mean of posteriors for fixed effects is 0.142 and for random effects is 0.114). In this specific scenario, the hypothesis is not directional. In fact, it is unclear if there even is an effect, so there is a definite need to test the null hypothesis. And if there is an effect, it is unclear in what direction it might go.

But really, my question is more about the theory of this, rather than the specifics of this exact question. This is a mini-project I am doing with a short-term student, mostly as an exercise to learn Bayesian statistics and JASP. The rationale is more important than the specifics of this question. I'm confused about the number of options that are available and how you select among them. Here are the options I see:

I could combine the two published results using a meta-analysis, and use that posterior to inform my prior for the current dataset. I can take the mean of the posterior, but you recommend using a fairly wide standard deviation. Why not use the estimated standard deviation from the meta-analysis, as well as the estimated mean from the meta-analysis? Is this because the analysis prior (as opposed to the design prior, from Bayes Factor Design Analysis) should be designed with a sceptic in mind? So the estimated standard deviation from the meta-analysis might be data-driven, but is too precise for a sceptic? To summarise, why not use the meta-analysis estimate of standard deviation?
Same as option 1, I could combine the two published results using a meta-analysis, and use that posterior to inform my prior for the current dataset. But just take the mean of the posterior as the estimate. I shouldn't take the meta-analysis estimate of standard deviation, based on the reasons provided to answer 1. But then how do I pick "a standard deviation that is fairly wide"? From the meta-analysis, the estimate seems easy to defend i.e. it comes from existing data. But if I need to widen this, how do you choose what to widen it to?
Why bother doing a meta-analysis on just the first two published results to get a prior for your own data? Why not do a full meta-analysis and simply add in your new dataset as one extra study into the meta-analysis? So now my meta-analysis has three studies in it, the two already published plus my own novel dataset? Assuming there are no practical reasons preventing this from being possible, this approach seems almost more defensible than the first two options. I feel like these three options are just weighting the prior data differently. Perhaps option two where you manually widen the standard deviation gives the prior data the least weight, option one where you use the meta-analysis estimated standard deviation gives the prior data more weight, and this third option of just putting all three into a meta-analysis gives the prior data even more weight. But if this is the case, I would have thought sceptics might like meta-analyses. So if the full meta-analysis approach maximally upweights the prior data, it goes against the logic in option two of widening the standard deviation for the two prior publications. Basically, why not always just do a full meta-analysis where your new data gets put into a meta-analysis of all previously existing data?
Lastly, I'm still confused as to what you should expect if using the “today’s posterior is tomorrow’s prior” approach. If all three datasets (my new dataset and the two published results) are suitable for starting with a default Cauchy distribution, and all three datasets remain constant in this hypothetical scenario, then I analyse one dataset after the other carrying the posterior forward as a new prior in each case, should order have any impact? I imagine with a fixed starting prior (e.g. default Cauchy for some sort of t-test) and fixed datasets, it should not matter what order you analyse datasets in a series. You should end with the same posterior. But that's not what I'm seeing. But this could be down to human error, so again, steering clear of the specifics, I just would like to know about what should be expected in theory. If I have three datasets, each of which are being analysed by the same type of t-test, should it matter if I analyse dataset 1 with default prior, then use that posterior as the prior for dataset 2, then use that posterior as the prior for dataset 3, or if I do this process in a different dataset order?

Thanks for your patience.

Best,

Luke.

Howdy, Stranger!

Categories

Creating an informed prior from Bayesian reanalysis of two frequentist publications

Comments

Howdy, Stranger!

Quick Links

Categories

Creating an informed prior from Bayesian reanalysis of two frequentist publications

Comments