A question regarding resampling in within-subject designs using distribution parameters
I'm currently reviewing a paper that uses an analysis that goes beyond my statistical expertise--but I suspect it is invalid. It concerns a resampling analysis, but I nevertheless hope you JASPian/ Bayesian folk can shed some light on it.
It's a within-subject design with 12 participants. Each of these participants has some effect, let's call it the dv(per-participant-observed)
. (To respect the author's anonymity, I'll stay vague.) If I understand correctly, the authors then do the following:
- They first determine the mean observed effect across the 12 participants. Let's call this
dv(grand-observed)
. - Next, for each participant, they randomly shuffle the data 1000 times, and determine a 1000 surrogate effects. Lets call these
dv(per-participant-surrogate)
. - Next, they repeat the following procedure again a 1000 times:
- For each participant, randomly select a(n undefined) number of
dv(per-participant-surrogate)
values and average those. Let's call the resultdv(per-participant-surrogate-average)
. - Take the average across participants of this
dv(per-participant-surrogate-average)
. Let's call the resultdv(grand-surrogate)
.
- For each participant, randomly select a(n undefined) number of
Still with me?
So we end up with 1000 dv(grand-surrogate)
values. Based on this, they determine the mean and standard deviation of the surrogate distribution. And then they look how far in the tail of this distribution dv(grand-observed)
is. The resulting p value is a whopping p=.00000000000000000001 (N=12!).
I have doubts about two things:
- Is it valid to do this kind of hierarchical resampling, first at the individual participant level, and then at the across-participant level? The implications of this approach hurt my brain.
- Is it valid to use distribution parameters to estimate p values, when the observed value is so far out into the tail? Shouldn't you rather just look at the percentage of surrogate values that are smaller/ larger than the observed value? (And not bother with the mean and standard deviation of the distribution?)
I would be grateful if anyone could shed some light on this!
Cheers,
Sebastiaan
Comments
Not sure about this. What I would do is create a Bayesian hierarchical model, use vague priors, and inspect the posterior distribution for the group-level mean effect size. The one-sided p-value should be very similar to the area of the posterior distribution lower than 0. See http://www.ejwagenmakers.com/inpress/MarsmanWagenmakersOneSidedPValue.pdf
E.J.