Approximating a partial non-parametric Bayesian correlation in JASP
Hi all, I am interested in augmenting some partial non-parametric correlations with Bayesian tests (my hypothesis is for a lack of association so being able to evaluate support for the null with Bayesian tests is appealing). JASP offers Kendall's tau as a non-parametric alternative for Bayesian correlation but does not support partial correlations.
However, I'm wondering if the following approach based on the manual calculation of a Spearman partial correlation would be a valid alternative. A Spearman partial correlation is calculated by 1) ranking all variables (i.e., X, Y, and Covariate), 2) obtaining residuals from the regressions between the covariate and x & y variables), and 3) performing a Pearson correlation on the residuals. Would I then be able to mimic this approach in the Bayesian module by running a Bayesian Pearson correlation on the residuals, which would be equivalent to a Bayesian Spearman correlation?
My concern is whether the resulting Bayes Factor would be valid or if the calculation of the posterior distribution would be different between the Pearson and Spearman approaches, despite the ranking of variables having been performed prior to obtaining the residuals. I have tried to look into this and found the following paper by the JASP group describing a Bayesian framework for Spearman correlations but unfortunately, I cannot follow the math: https://www.tandfonline.com/doi/full/10.1080/02664763.2019.1709053
The paper also has code to calculate the Bayesian Spearman correlation but unfortunately not partial correlations.
Might anyone have any suggestions or insights about this topic?
Many thanks!
Michael
Comments
I believe that multiple regression is all about assessing relationships between variables while having "partialled out" some other variable or set of variables. So I think you'll be fine if you're willing to shift your question to something like,
"What is the standardized slope (,i.e., the beta coefficient) for the XY relationship controlling for C?"
Then you could run a Bayesian hierarchical regression by adding Variable C to the null model (under the "Model" option).
R
Actually, I don't have an answer. I meant to write:
> I believe that multiple regression is all about
> assessing relationships between variables
> while having "partialled out" some other variable
> or set of variables. So I think you'd probably be fine
> if you were willing to shift your question to something like,
> "What is the strength of the XY relationship, controlling for C?"
> Then you could run a two-predictor (X and C) Bayesian
> hierarchical regression by adding Predictor C to the null
> model (under the "Model" option). The value indicating
> "the strength of the XY relationship, controlling for C"
> would be R-squared for the non-null model.
However, that would not address the requirement that it be a
non-parametric analysis.
R
Thanks, Anderson. Agreed, the challenge here is wanting to use a non-parametric approach. I'm hopeful someone might have additional insights into this!
Has there been any developments on this? JASP has partial Kendall tau for frequentist, but not bayesian. If the devs could even suggest an alternative for this (in R) I would appreciate it.
(I'm currently stuck with a paper that had bayesian for most analyses except a few partial ones... it makes things awkward)
I think if you're going to use Bayesian statistical analysis, you need to inform your audience, at the start, that such analyses are limited in that there's often no available Bayesian option for non-parametric testing.
R
It seems reasonable to conduct a Bayesian analyses on the residuals. It is not fully Bayesian, so some uncertainty will go unaccounted for, but that strikes me as a second-order effect.
EJ
Hi @EJ , That is what I ended up doing. But can you explain why you say it is not fully Bayesian?
Here is the code I'm using in {brms} is anyone needs it. It gives you the same result at the frequentist Spearman in JASP, if you use flat priors:
### Bayesian "Spearman" Partial Correlation Script ###
# Load and set env
library(brms)
library(bayestestR)
options(mc.cores = parallel::detectCores(logical = FALSE))
set.seed(42)
# Generate some data
n <- 1000
z <- rnorm(n)
x <- 0.7 * z + rnorm(n)
y <- 0.3 * z + 0.5 * x + rnorm(n)
# Function to compute Bayesian Spearman correlation
bayesian_spearman_mv <- function(x, y, z) {
# Convert to ranks
dat <- data.frame(
x_rank = rank(x),
y_rank = rank(y),
z_rank = rank(z)
)
# Use the mvbind() trick
mv_formula <- bf(mvbind(x_rank, y_rank) ~ z_rank) + set_rescor(TRUE)
# Model
model <- brm(
mv_formula,
data = dat,
backend = "cmdstanr",
silent = 2
)
return(model)
}
# Run function
mv_model <- bayesian_spearman_mv(x, y, z)
### Correlation estimate and CIs w/ describe_posterior()
post <- as_draws_df(mv_model)
rescor <- post$rescor__xrank__yrank
posterior_df <- data.frame(rescor = rescor)
describe_posterior(rescor, centrality = "mean", ci = 0.95)
Nice! The reason why I said that it is not fully Bayesian is because in a two step analysis, the uncertainty in the first step (the residuals) is unaccounted for.
I think there are differing opinions on whether the residuals approach is or is not fully Bayesian. If the Bayesian hypotheses pertain to the raw, 'un-residualed' data, then yes, some uncertainty is unaccounted for. But if the formulated, Bayesian hypothesis is, for example, "The correlation between the residuals is zero," then I don't think the 'un-residualed' data are a subject of the hypothesis. Thus, uncertainty regarding the actual hypothesis is fully accounted for.
As an analogy, imagine that you have 100 reaction times from each of 50 participants, and your Bayesian null hypothesis is that the mean of those 5,000 values is zero. If you try to test this hypothesis by first calculating a median for each participant and then conducting a Bayesian t test on those medians, then there's some uncertainty (regarding the mean of the 5,000 values) that's uncounted for. However, if your Bayesian null hypothesis is that the mean of the 50 median reaction times is zero, then you're fine: You have a fully Bayesian analysis--as long as you limit your conclusions so that they refer to a population of medians rather than to a population of raw, unaggregated values.
R