#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

# Multicollinearity in Bayesian linear regression?

Hello Team JASP!

I want to do a linear regression analysis along the lines of: age, literacy, years of education predict cognitive functions. Naturally, literacy and years of education correlate highly with one another as is to be expected. This means I shouldn't use them as combined predictors in one regression, right?

If I do use them together as predictors, the Bayesian inclusion probability plot suggests for me to only keep years of education for my first outcome variable. So this sounds to me as if education years and literacy are independent enough in their prediction for their effects to be separated from one another?!

Should I now add literacy to the null model or just remove it since it shouldn't be included? What am I doing about my multicollinearity?

My approach so far has been to run a frequentist model that parallels the Bayesian one to check the assumptions...

• Hi eniseg2,

The problem of multicollinearity is a hard one. From looking at the individual models you can assess whether it is the case that high-probability models either include the one predictor, or the other, but not both. Of course the investigation starts with considering the scatterplot and the strength of the relation. If the predictors are highly collinear, and both are important, then the inclusion probabilities should remain near 0.5 (because in the models that matter, only one of the two collinear predictors is included).

If you want to walk to royal road to address this issue you could think of using a network approach, or a SEM model. But that is a lot of extra work with models that are a lot more complicated.

Cheers,

E.J.

• Hi there! When dealing with highly correlated predictors like literacy and years of education, it's important to consider the issue of multicollinearity in your linear regression analysis. Including both predictors in one regression may lead to inflated standard errors and difficulties in interpreting the individual effects of each predictor.

Based on your Bayesian inclusion probability plot suggesting to only keep years of education, it seems that education years might be a stronger predictor for your outcome variable compared to literacy. In this case, you can remove literacy from the regression model to avoid multicollinearity and focus on the independent effect of years of education.

Alternatively, if you have a strong theoretical basis or previous research indicating the importance of literacy, you can consider adding it to the null model as a separate analysis to explore its individual contribution to the prediction of cognitive functions.

Remember to assess the variance inflation factor (VIF) to quantify the degree of multicollinearity between predictors. If the VIF values are high (typically above 5 or 10), it suggests substantial multicollinearity, and you may need to address it by selecting a single predictor or using alternative techniques such as principal component analysis.

Best of luck with your analysis!

• Hi there! When dealing with highly correlated predictors like literacy and years of education, it's important to consider the issue of multicollinearity in your linear regression analysis. Including both predictors in one regression may lead to inflated standard errors and difficulties in interpreting the individual effects of each predictor.

• Based on your Bayesian inclusion probability plot suggesting to only keep years of education, it seems that education years might be a stronger predictor for your outcome variable compared to literacy. In this case, you can remove literacy from the regression model to avoid multicollinearity and focus on the independent effect of years of education.