Optional stopping problems - paper by de Heide & Grünwald (2021)
Rianne de Heide visited the other day and gave a talk about why optional stopping can be a problem for Bayesianists, alluding to Rouder's 2014 paper called "Optional stopping: No problem for Bayesians".
The talk was quite complicated and also the article (de Heide and Grünwald https://doi.org/10.3758/s13423-020-01803-x , dH&G) on which the it was based was not too accessible to me.
They first explained what Rouder meant by "not a problem": nominal and observed posterior odds should be as close as possible to each other, no matter when you stop collecting data.
The dH&G paper then shows some examples under which optional stopping is indeed unproblematic, basically replicating Rouder's findings. Some variants of their simulations, however, showed examples where observed and nominal posterior odds did not align anymore. Those were cases in which the data-generating process in the simulations differed from the priors assumed by the statistical model (e.g., Cauchy prior on effect sizes in a Bayesian t-test). Rouder simply sampled his data from priors of the statistical models, which presumably is why he did not find that problem.
This is related to the distinction between objective and subjective Bayesianism: if an analyst truly believes the priors in a model, that is called subjective, if, however, an analyst uses priors simply out of mathematical convenience (as most people do and as most JASP users probably do as well), that is called objective Bayesianism.
One of the conclusions was that for objective Bayesians (i.e., the large majority of users), optional stopping can indeed be a problem because they are more likely to be faced with a mismatch between the data-generating process and the model priors.
- Is my summary more or less accurate?
- How bad is the problem for us in practice and, in particular, how bad is it compared to NHST approaches? Fig. 4 in dH&G, which shows the misalignment between nominal and observed posterior odds do look bad - but perhaps it is still much better than NHST? It would put the criticism in perspective.
- What are your conclusions/recommendations regarding the use of objective Bayesian statistics, JASP etc. for people whose work relies on robustness against optional stopping? Can we fix the problem for example by putting more effort into turning our priors into subjective Bayesian priors instead of convenience priors?
- I have a hard time understanding why it is about whether the analyst believes in their priors or not. Do the simulations not simply show that the problem really lies in whether or not the prior assumed by the model is accurate or not?
I would be very grateful for your expert assessment of the issue. Apologies for my limited understanding of this. I thought that I do get the basics of the mathematical "machinery" behind Bayesian statistics and I was surprised that this (rather philosophical) distinction between objective and subjective Bayesianism apparently matters a lot.