# A question regarding resampling in within-subject designs using distribution parameters

I'm currently reviewing a paper that uses an analysis that goes beyond my statistical expertise--but I suspect it is invalid. It concerns a resampling analysis, but I nevertheless hope you JASPian/ Bayesian folk can shed some light on it.

It's a within-subject design with 12 participants. Each of these participants has some effect, let's call it the `dv(per-participant-observed)`

. (To respect the author's anonymity, I'll stay vague.) If I understand correctly, the authors then do the following:

- They first determine the mean observed effect across the 12 participants. Let's call this
`dv(grand-observed)`

. - Next, for each participant, they randomly shuffle the data 1000 times, and determine a 1000 surrogate effects. Lets call these
`dv(per-participant-surrogate)`

. - Next, they repeat the following procedure again a 1000 times:
- For each participant, randomly select a(n undefined) number of
`dv(per-participant-surrogate)`

values and average those. Let's call the result`dv(per-participant-surrogate-average)`

. - Take the average across participants of this
`dv(per-participant-surrogate-average)`

. Let's call the result`dv(grand-surrogate)`

.

- For each participant, randomly select a(n undefined) number of

Still with me?

So we end up with 1000 `dv(grand-surrogate)`

values. Based on this, they determine the mean and standard deviation of the surrogate distribution. And then they look how far in the tail of this distribution `dv(grand-observed)`

is. The resulting p value is a whopping p=.00000000000000000001 (N=12!).

I have doubts about two things:

- Is it valid to do this kind of hierarchical resampling, first at the individual participant level, and then at the across-participant level? The implications of this approach hurt my brain.
- Is it valid to use distribution parameters to estimate p values, when the observed value is so far out into the tail? Shouldn't you rather just look at the percentage of surrogate values that are smaller/ larger than the observed value? (And not bother with the mean and standard deviation of the distribution?)

I would be grateful if anyone could shed some light on this!

Cheers,

Sebastiaan

## Comments

Not sure about this. What I would do is create a Bayesian hierarchical model, use vague priors, and inspect the posterior distribution for the group-level mean effect size. The one-sided p-value should be very similar to the area of the posterior distribution lower than 0. See http://www.ejwagenmakers.com/inpress/MarsmanWagenmakersOneSidedPValue.pdf

E.J.