VIF in GLM
For GLM, which can handle non-linearity, JASP reports VIF, which by my understanding assumes linearity (if this is incorrect, please set me right).
Would GVIF not be more appropriate to report, which doesn't assume linearity?
So, the VIF values presently reported by GLM could be very spurious? If there is non-linearity, which is the forte of GLM.
Comments
Hi @Michael_Jasper ,
In GLM we provide de GVIF (based on the vif function from the car package). I will update the helpfiles/interface to be more informative of this.
Cheers
Johnny
So you mean that the VIF value that GLM reports, which it calls VIF, is actually GVIF?
Please don't miss my question in the post above.
To add to that, is GVIF used throughout JASP, so that even linear regression is reporting GVIF (but calling it VIF)?
Or is GVIF only used for GLM because of the non-linearity that might be present with its typical use? And so not used with linear regression?
Michael, I think there might be some confusion as to what problem GVIF solves. GVIF is designed to accommodate categorical **predictors**, whereas VIF had been originally designed to accommodate numerical predictors only. As far as I know this as nothing to do with the difference between GLM with identity link (i.e. linear regression) and other link (e.g. logit for logistic regression). (G)VIF is concerned with predictors, not the outcome.
GLM is linear in its parameters regardless of the distribution of the outcome. I think this is what is causing you confusion, as you talk a lot about nonlinearity, but the predictor side of GLMs is the same regardless of the outcome distribution, what changes is the link function that links this linear predictor to the outcome, which is NOT the focus of (G)VIF.
@patc3 @JohnnyB @EJ I'm interested in collinearity between variables x and z. I did a multiple linear regression with them both (with a variable y as the dependent variable) in JASP to simply pull out a VIF value as a measure of this collinearity between them. BUT the problem is that x and z aren't linearly related to one another. And so the VIF measure of their collinearity might be inaccurate, as I explain below in my provisional write up:
"VIF calculation assumes linear relationships among variables, which is violated here. Therefore, their VIF values are shown with the caveat that they might be under- or over- estimates, possibly being so inaccurate as to be very misleading. Inaccuracy likely scales with the degree of non-linearity."
So, my interest in GVIF is that - by my understanding - it does NOT assume linearity and so can confer a measure of collinearity even for variables that aren't linearly related. Is this understanding correct?
If so, if GLM in JASP reports GVIF (instead of VIF) I can then do a spoof GLM with these variables to pull out the GVIF to use (my being interested in the GVIF values rather than the GLM itself). Or does multiple linear regression actually report GVIF (instead of VIF) itself?
If anything isn't clear, very happy to clarify on request.
sorry I'm not sure what the answer to your question is, so I'll let others chime in
Had a play around:
with same data in JASP: VIF with GLM is different value than VIF with multiple linear regression.
Very different values actually.
Is this because multiple linear regression reports VIF and GLM reports GVIF?
And/or is the GLM reporting upon the variables after they have been transformed in some way to create the GLM? This would prohibit me from just using it as a GVIF measure with my variables (which I'm interested in and not the GLM itself per se).