 #### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

# choice of test for mean difference of weighted multinomial data

Hi folks,

I am an R user, so question is in terms of the BayesFactor package, but I was thrilled to discover JASP. It has been great to explore Bayesian statistics and Bayes factors through the papers and packages supporting this software. Thanks to all the contributors for your work in this area.

I have read in several places that it is a misconception that the frequentist t-test requires normally-distributed data, but that in fact only the test statistic needs to be normally distributed. Assuming this is the case, how does this limitation apply to ttestBF() where observations y_i are modeled explicitly as iid normal, y_i ~ Normal(mu, sigma^2) (as per Morey and Rouder (2011) in the function documentation)? The question of assumptions generally have been asked and answered (https://forum.cogsci.nl/discussion/2322/bayesian-test-assumptions), but this specific aspect was not addressed.

To be more concrete, I am considering an A/B test of a multinomial distribution where each category is associated with its own weight, and most counts appear in the category with weight 0. The weights are on a ratio scale. The variable of interest is the A/B difference of the weighted mean, which for each group is the sum of (posterior probability of category)*(category weight). So a) the data-generating process is obviously not normally distributed even if many samples of the mean difference is, and b) the mean is much smaller than any of the weights, but c) contingencyBF() alone doesn’t fully capture the A/B difference because the weights matter.

I feel like my options are:

-Use ttestBF() because the mean difference can be shown to be normally under resampling, assuming that this applies to the Bayesian test.

-Use a contingencyBF() hypothesis test with independent multinomial sampling, followed by a loss function using the weights and posterior distribution. My concern in that case is that the test is not considering the hypotheses of actual interest, and so among other things I will never have Bayes factors for those hypotheses.

-Use a Bayesian rank sum test. My concern here is that I haven't applied this test before so I want to understand a) if it applies here and b) if the code in the OSF files linked in a previous post (https://forum.cogsci.nl/discussion/5024/bayes-factors-for-non-parametric-tests) is still the recommended implementation in R.

Any insight would be greatly appreciated!

Thanks!

Robert

• Hello Robert,

For the Bayesian result the t-value and the n are sufficient (so in the Summary Statistics module you can obtain a complete Bayesian analysis just from the t-value and sample size; access to the data would not change the inference). So from that perspective I don't think it is a problem.

For the comparison of two multinomials, my first suggestion would be to use log-linear regression or the contingency table formulations, but these lack the weights, as you mention. It seems to me that this is a problem that requires special treatment...

Cheers,

E.J.

• Hi EJ,

Thanks for your comments on the t-test. For the weighted multinomial comparison, knowing that this problem requires special treatment is also very helpful.

Thanks again,

Robert