Potentially daft question about data accumulation
I just wanted to canvass opinion here of people who know much more than me on 2 related issues.
Say we have 2 datasets both with the same treatment and control in. We could analyse in 2 ways:
- Perform our test on Dataset 1, with a default 0.707 prior. Use the posterior as the prior to perform the test on dataset 2
- Put all the data together and perform the test with the 0.707 prior.
Are these two equivalent, are they very close or coudl they feasibly be very differnt?
What are the board's thoughts on using Bayesian for exploratory analysis and using the data from which the hypotheses were generated in a more confirmatory analysis, after collecting more data? My opinion is, as long as you are open about how many exploratory outcomes you tested (the more tested the more likely findings are spurious) and as long as you collect a decent chunk of data in a more confirmatory study (Maybe 2-4 times the amount as in an exploratory study) this should be fine. In other words, as long as you are open about what you did, and the evidence can be judged accordingly. Obviously, if you look at 10,000 outcomes, and only add 2x the data in the hypothesis generating group, your results are not going to be very credible.
Specifically, in some data collected for an MA we explored 8 possibly models. We found anecdotal evidence for an effect in (n=51) participants. We collected another n=79, to probe this outcome further, and we find extreme evidence for this effect. I tend to believe that the effect is not spurious based on that evidence.