Bayesian contingency tables

DIV · February 2022

Hello, all.

I have been having a little look at two articles related to the Bayesian analysis of 2×2 contingency tables as implemented in JASP in the Frequencies module through the contingencyTableBF function in the BayesFactor package for R:

E. Gûnel & J. Dickey, Bayes Factors for Independence in Contingency Tables. Biometrika, 1974. 61(3): p. 545–557. GD74. https://www.jstor.org/stable/2334738
T. Jamil, A. Ly, R.D. Morey, J. Love, M. Marsman & E.-J. Wagenmakers, Default “Gunel and Dickey” Bayes factors for contingency tables. Behavior Research Methods, 2017. 49(2): p. 638–652. J+17. https://doi.org/10.3758/s13428-016-0739-8

I am mostly looking at the Poisson sampling scheme.

I'm sure that others here will be knowledgeable about those details. There were a few points I was wondering about.

Comparing the first fraction of equation 7 in J+17 with second equation given in §4.6 of GD74, they look almost identical. Is that a fluke, or are they supposed to correspond? If the latter, then what happened to the min() function appearing in GD74?
In JASP and in the discussion of an example in J+17 it seems that the interest is in testing whether proportions in the respective rows are unequal (put another way, as in the JASP GUI: the alternative hypothesis specifies that the column-one group is not equal to the column-two group). Practically speaking it seems that the numerical results are the same if the researcher had instead been interested in comparing proportions in the respective columns (that is, testing hypotheses about the two row groups). Is that correct? If so, it implies that equation equation 7 in J+17 is invariant to row swaps, column swaps and transpose, which — by inspection — it almost seems to be except for singling out of "y1." for special treatment.
In discussing the "independent Poisson model" in §4.6, GD74 write that "the choice" of a_ij=a=1 and b=4/n.. "shall favour" the alternative hypothesis. (It's not obvious to me why.) Indeed J+17 also report that use of the "default" GD74 priors results in a Poisson model that is "most reluctant" in its support for the null hypothesis. If this is the case, then why not choose a value of "a" that would be more neutral? (What would that value be? Or how could it fairly be chosen?) Even though GD74 analysed the case of a=1, it wasn't apparent to me that they were necessarily recommending it. On the other hand, should this be the mandatory penalty for adopting a Poisson sampling scheme?!! In thinking about this, I wonder about data that would produce BF=1 for the various sampling schemes. (J+17 focussed on data producing either largish or smallish BF.)
I had a go at implementing equation 7 of J+17 in code (not in R) that should be using 50+ digits of precision, with sample data of [50, 150; 10, 100]. While JASP reported BF = 96.085,266,860,211, my code returned BF = 96.085,266,860,139. Just a comment.

—DIV

EJ · February 2022

Hello DIV,

Ah, this is a while ago. Some quick responses here. I will note that it takes some time to get to the bottom of this -- I recall this being a bit of a puzzle even when we were working on it.

We checked our implementation with the example results provided by GD74. That's all I can say about it before digging in to the finer details.
From memory: it does matter whether the groups are defined by row or by column, so it is not invariant.
I'm not sure, would have to dive back in. A choice of a=1 seems fairly standard though
OK; this could be do to a number of reasons, none of them interesting (I suspect). So I'd worry more about cases 1 and 3.

Cheers,

E.J.

DIV · February 2022

Thank-you for your feedback, E.J..

It is good that you were able to verify your implementation by confirming that they also regenerated the same results as published by GD74 in their example. On the other hand, the example given in GD74 gives numerical results pursuant to equations 4·2–4·7 and 4·11–4·13, whilst the min() function appears in an unnumbered formula that is given after all of the above-mentioned equations (somehow related to equations 4·8, 4·10 and 4·11). Therefore it might not have been directly included in the validation. Although the formula represents a 'special case' (e.g. restricted to a 2×2 table), if a min() function occurs in the special case, I would have anticipated that it would likewise appear in the most general implementation.
My calculations with a small number of test cases indicate that it's invariant in practice. [50, 150; 10, 100] and its transpose both yielded BF = 96.085,266,860,139; similarly, [100, 10; 100, 10] and its transpose both yielded BF = 0.192,413,543,894,3. (In fact, they both agreed to 50+ decimal places.) Perhaps there is a special combination of counts that would break the invariance? Or perhaps only with tables larger than 2×2?
That was my sense of the matter too.
I agree: this is not a particular concern, just an observation.

—DIV

DIV · February 2022

Further on point 2:

For a 2×2 table it seems that strict independence requires that (say) the ratios of Poisson rate parameters in both columns are equal, namely λ11/λ21 = λ12/λ22. But simple arithmetic shows that this is identical to λ11/λ12 = λ21/λ22, being equality of the ratios of Poisson rate parameters in both rows! Therefore invariance makes sense conceptually.
I have now tested application of equation 7 from J+17 on one thousand randomly generated 2×2 contingency tables (each with counts between 0 and 50), and all were invariant upon transpose, tested to 50 significant figures. Therefore invariance is practically demonstrated.
Now that I look more closely at equation 7 from J+17, it becomes apparent that the (y1.+1) in the numerator of the first fraction partially cancels the denominator of (y1.+1)! in the second fraction, with the latter being reduced to y1.! . With that algebra applied it is now apparent why the results are invariant to transposition of the tables (i.e. converting rows into columns). Therefore invariance is a general result of equation 7 published by J+17.

—DIV

EJ · April 2022

OK it is clear that this needs another look from me. I am not eager to do this as I recall the Gunel and Dickey paper was not easy to understand (conceptually their approach was clear, but mathematically things weren't completely spelled out)

E.J.

Howdy, Stranger!

Categories

Bayesian contingency tables

Comments

Howdy, Stranger!

Quick Links

Categories

Bayesian contingency tables

Comments