# Multicollinearity in Bayesian linear regression?

Hello Team JASP!

I want to do a linear regression analysis along the lines of: age, literacy, years of education predict cognitive functions. Naturally, literacy and years of education correlate highly with one another as is to be expected. This means I shouldn't use them as combined predictors in one regression, right?

If I do use them together as predictors, the Bayesian inclusion probability plot suggests for me to only keep years of education for my first outcome variable. So this sounds to me as if education years and literacy are independent enough in their prediction for their effects to be separated from one another?!

Should I now add literacy to the null model or just remove it since it shouldn't be included? What am I doing about my multicollinearity?

## Comments

And what about autocorrelation?

My approach so far has been to run a frequentist model that parallels the Bayesian one to check the assumptions...

Hi eniseg2,

The problem of multicollinearity is a hard one. From looking at the individual models you can assess whether it is the case that high-probability models either include the one predictor, or the other, but not both. Of course the investigation starts with considering the scatterplot and the strength of the relation. If the predictors are highly collinear, and both are important, then the inclusion probabilities should remain near 0.5 (because in the models that matter, only one of the two collinear predictors is included).

If you want to walk to royal road to address this issue you could think of using a network approach, or a SEM model. But that is a lot of extra work with models that are a lot more complicated.

Cheers,

E.J.