Assumptions Independent t-test - v.0.19.1 and v.0.19.3
Hello Team,
We have noticed a difference in the Shapiro-Wilk normality test between versions 0.19.1 and 0.19.3 of JASP when performing an independent t-test.
- In JASP 0.19.1, two Shapiro-Wilk statistics are reported (one per sample).
- In JASP 0.19.3, only one Shapiro-Wilk statistic is displayed, applied to the model residuals.
This question has already been asked on GitHub (https://github.com/jasp-stats/jasp-issues/issues/3308 ), and Thomas Langkamp explained that "version 0.19.1 was wrong. Only the residuals to be checked, this only one Test is needed." Why ?
Could you provide more theoretical details and explain how normality is tested in each of the two versions so that I can understand the theoretical foundations and the interpretation of normality tests clearly.
We get that normality assumption for independent t-tests concerns the residuals and not the samples distribution themselves. However, we have several questions about the underlying methodology and calculations made by JASP because we did not find the specific details in release nor in JASP manual :
- How are the residuals calculated? Is it simply the difference between each observation and the average for his group?
- Is the Shapiro-Wilk test applied to the residuals of each group separately or to all the residuals of the two groups combined? I supposed on all datasets.
- What happens if one of the samples follows a normal distribution and the other does not? Does the test based on the residuals always allow a normality problem to be detected? Is the Shapiro-Wilk test able to detect the non-normality of the one sample? Is this about Central Limit Theorem ?
- Is it the data or the residuals that should normally be distributed? In the manual by Navarro et al. (2019, Learning statistics with JASP v1/sqrt(2) ), section 10.8 on "checking the normality of a sample", p.241; it is mentioned that the assumptions are about the normality of the residuals for the ANOVA and of the samples.
Any thoughts?
Notes.
- We have noticed that the Shapiro-Wilk test on two independent samples can be performed outside the t-test, in descriptive statistical menu.
- The information button in independent sample t-tests (on the top right) does not match the updated Shapiro-Wilk test. It says that “The second column contains each level of the grouping variable.”, but there are none.
Thank you in advance for your clarifications, which are always interesting, and your work on JASP.
Johan
Johan A. ACHARD
PhD Student in Cognitive Sciences
Université Franche-Comté
Comments
Hi @johan_achard ,
The normality assumption when using a linear model (which the t-test is one of) indeed concerns the model residuals/error. In the case of the t-test, the model error is indeed the difference between the observations and the group means. This means that for all observations, we have a single set of residuals that we then test the normality of (either through the S-W test, or by looking at its Q-Q plot). This is the same process that is used in the ANOVA (also only a single Q-Q plot).
If one of the groups is not normally distributed, but it does not throw off the residuals, then I guess that's not a problem, because again, the assumption concerns normality of the residuals. I do want to also stress that normality is not an all-or-nothing thing, and that instead there are degrees of normality: the less normality in your residuals, the less reliable your conclusions might be based on that model.
I believe this answers your four questions. I will still update the t-test help-files, because those look outdated indeed.
Kind regards,
Johnny
Hi @JohnnyB ,
Great response, jumping into the conversation:
-For a paired sample t-test, I thought that the the Shapiro test was performed on difference scores. Is it still the case?
-Coming back to the above problem, it seems to me that for a t-statistic to follow a t distribution, the numerator should follow a normal distribution, while the denominator should follow a chi-square distribution. Strictly speaking, it it then the sampling distribution of the mean (or difference in means) that should be normally distributed. If the residuals are normally distributed, I think that this condition is necessarilly satisfied -I haven't done the maths, so a mathematical proof might be useful (if someone has a reference..). Feel free to correct my reasoning if it is incorrect!
Thanks again for this great software
Hi @mservant ,
For the paired t-test that is indeed the case (the model prediction is 0, so the residuals are the observed differences, although we are planning to include a setting to change the expected value under H0).
My maths are not strong enough to answer your second question, unfortunately - I'm very much an applied guy 😅
Cheers,
Johnny