Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

computational running time JASP: ANCOVA

Dear all,

My aim is to run a Bayesian linear regression in order to derive the evidence in favour of the null hypothesis. That is, my interest is to substantiate that the null finding observed in the frequentist approach to the linear regression (i.e. I observe a null finding using classical p-value based inference) is indeed a true null finding or whether it is one that has a small effect size and cannot be detected.

Method: Given that the regression's predictors include categorial variables, I seem unable to use the 'JASP Bayesian regression' so I am forced to use the 'JASP Bayesian ANCOVA. My dependent variable is continuous variable, the covariate of interest is categorical (binary) variable and I've set this in the 'fixed group effects section', and then I have an additional 14 covariates which I add to the 'covariates' section. The n=1526. I have set the Bayes Factor to B01 and am comparing to the null model.

Problem: the analysis has been running for days (3) and still no result. Is this normal?

Thank you,

Francesca

• Three days!! That is not normal. Then again, 14 covariates is a lot. I'll pass this on to the team.

E.J.

• Hi Francesca,

Unfortunately this behavior is no bug, but simply a consequence of having a model with 14 covariates. With this many variables, there are 2^15 = 32768 possible models to consider.

When using the BayesFactor package in R to enumerate all possible models (so not even to fit them), this already takes 20 minutes for me in the case of 10 covariates, and this time grows exponentially for each added covariate.

We should really look into some disclaimers that pop up when trying to use a lot of covariates (also I would not have expected this issue to arise for "only" 14 covariates). I am not sure what the best way forward is in your case, since I have no idea how long this could still take (could be just a day, or 2 months...). Alternatively, you could look into running subsets of your possible models, by running the analysis with 7 or 8 covariates at a time (as well as assessing colinearity between all 14 covariates and te DV in the correlation module). Please let me know if you have any further questions or need some more help with this!

Kind regards

Johnny

• edited November 2020

Hi Francesca,

By default JASP enumerates all possible models. In your case with 15 predictors there are 2^15 = 32768 models. That simply takes a very long time. We will look at ways we can improve the speed and feedback (e.g., in your scenario we should probably give an indication of how long an analysis might take). If all your categorical variables are binary you could consider dummy coding these (0, 1) and use linear regression instead. Alternatively, we can show you how to run the analysis in R which should be faster depending on how many levels your categorical covariates have.

I hope this helps. If you have any questions, please let me know!

Don

• Dears EJ, Johnn and vandenman,

Thank you all very much for your replies. I left JASP running for 10 days or so and it eventually crashed (black boxes appeared on the results page). To clarify, my machine is not the fastest out there, but it can do an okay job with moderately heavy analyses (e.g. neuroimaging via matlab, etc) - so I am not sure what's going on.

@JohnnyB - yes, I think it would be very useful to get an estimate of the running time, and/or some display of the iterations it's going through.

@vandenman unfortunately, 1 of the categorical variables is not binary, but I'll try your suggestion and run a bayesian linear regression and exclude this variable. Two questions: 1) Isn't a dummy variable categorical? Do you mean I should relabel the dummy variables as continous variables? 2) It would be extremely helpful to get some R code to attempt running the Bayes linear regression there - how do you think is best to proceed?

Thanks very much for your time and support,

Best,

Francesca

• Hello again,

@vandenman - I have tried to run a linear regression with the binary variables coded as dummy variables (0/1) but these are still perceived as categorical by JASP (which indeed they are) and hence JASP still doesn't allow me to add them as covariates. Have I perhaps misunderstood your suggestion re: attempting the Bayes linear regression using dummy variables?

Thank you,

Francesca

• ^ I have tried to run a linear regression = I have tried to run a BAYES linear regression

• An update: I have used the BayesFactor package in R to run a Bayes linear regression using the regressionBF function. It only takes 10 minutes to run with all my 15 covariates. However, there are 2 issues:

1) the output from R does not match the output from JASP

I run a Bayes ANCOVA in JASP with with only 3 of my covariates. I run a Bayes linear regression in R with the same 3 covariates. The results are a bit different for the models with more than one term e.g. compare "sex + Age14" in JASP gives 0.004 whilst in R 0.00634 (see below).

JASP output:

# Models             P(M) P(M|data) BF M BF 10 error %

# Null model           0.125  0.555  8.719 1

# p1fsses_reversed         0.125  0.317  3.242 0.571  0.003

# Age14              0.125  0.038  0.274 0.068  0.002

# Age14 + p1fsses_reversed 0.125  0.035  0.256 0.064  0.012

# sex               0.125  0.033  0.241 0.06  0.13

# sex + p1fsses_reversed     0.125  0.018  0.13 0.033  1.63

# sex + Age14           0.125  0.002  0.015 0.004  2.279

# sex + Age14 + p1fsses_reversed 0.125  0.002  0.014 0.004  1.405

R output:

# Bayes factor analysis

# --------------

# [1] sex               : 0.05993233 ±0%

# [2] Age14              : 0.06800264 ±0%

# [3] p1fsses_reversed         : 0.5713386  ±0%

# [4] sex + Age14            : 0.006342882 ±0.01%

# [5] sex + p1fsses_reversed      : 0.0531472  ±0.01%

# [6] Age14 + p1fsses_reversed     : 0.06387132 ±0.01%

# [7] sex + Age14 + p1fsses_reversed : 0.007490356 ±0.01%

2) My interest is to understand the Bayes Factor for the variable 'sex' whilst taking into account all the other covariates. Using a frequentist approach, i find that sex is not a a significant predictor of my DV. So, my Bayesian question is: what is the probability that the observed null finding is true? As mentioned above, to answer this question I am trying to run a Bayesian ANCOVA in JASP or a Bayesian linear regression in R.

To answer my question, I should be looking at JASP's Effects table, and look at the BFincl term for 'sex'. Correct? In other words, the output table printed above cannot answer my question as the first line 'sex 0.05993233' is giving me the BF for 'sex' alone not whilst considering it with all the rest of the variables? Is this correct? In which case, I am stuck. JASP provides the effects table but cannot run my 15-covariate analysis. R can run the 15-covariate but doesn't produce an Effects table. Would you be able to help? Thank you!!

Francesca

• Hi Francesca,

1) Due to the algorithms used to compute these Bayes factors (there is some random sampling involved), there will be some slight fluctuations between each repeated result. The error % column in JASP gives an indication of how heavy these fluctuations are.

2) Here it gets tricky: The effects table lists each independent variable, where the inclusion bayes factor is weighing all models with, vs all models without that factor. So to make this table, you need to estimate all possible models with your predictors. Estimating a single model (e.g., your 15-covariate model) is fast, but apparently Bayesfactor has some issues when the number of models blows up.

If you are only out to test this effect for "sex", you could do a manual comparison, where you esimate the full model, and compare that to the full model without "sex". As a sidenote, I also think that when you use regressionBF, you are not using your factor variable as a factor, but are converting it to a numeric (because else regressionBF would have given you an error, but lmBF would have been the appropriate function). The lmBF function also gives you a Bayes factor against the intercept only model. So you can take the quotient of the two BFs you obtain this way (full model, and full model w/o "sex") - this will be your Bayes factor for/against "sex" being a meaningful predictor on top of all other variables in the model.

I hope this clarifies things, if not please let me know =)

Johnny

• Hi Johnny,

I used lmBF twice:

Model 1: sex + 14 other covariates = BF = 9.445001e+53

Model 2: 14 other covariates = BF = 5.411868e+54

The quotient (model1/model2)=0.174523

The inverse quotient (model2/model1)=5.7298

Interpretation: The analysis above suggests that compared to all the other predictors in the model together (not the null model, right?), the data are about 5.73 times more likely to show no 'sex' differences (null finding).

Questions:

1) Does this interpretation make sense to you too?

2) And what if I wanted to compare to the 'null model' rather than the 'other predictors'? I'm not sure if i'm confusing the theory here.

Thank you,

Francesca

• Hi Francesca,

That's great to hear!

This is indeed where things are not very clear, because there are basically two approaches you can use here:

1) Compare the full model to the full model without sex (as we discussed above). This gives an indication of how well those two models perform, and since the only difference between them is "sex", if the latter model outperforms the former, you can see this as evidence that adding "sex" is not valuable.

2) Instead of only looking at that single comparison, you can look at all versions of the model that do not contain sex (for instance, a model that contains 8 predictors, none of which are "sex"), and compare those to that same model, but with "sex" added (for instance, that model with 8 predictors + "sex). This also gives an indication of how much (or little) added predictive value "sex" has, but now in settings where fewer models are included. With all those model comparisons in hand, you could then average over all the comparisons; this would typically be your inclusion BF (BF_inc).

Because we cannot do 2), 1) is a good alternative, and is the comparison that (in my view) paints the most complete picture because it simply considers all variables. The full model without "sex" would be the null model (although that is just semantics). The null model is simply the model that has all the components you are not interested in for that particular analysis (i.e., variables you want to control for), and the alternative model is the model that has those components, plus the components you are interested in for that particular analysis. If you would take the "hardcore null model", which only has an intercept, and compare that to the full model, you can hardly say anything about "sex" as a predictor only.

The interpretation of the Bayes factor you report is: "The data are 5.73 times more likely under the model that has <all predictors except sex> compared to the model that has <all predictors>. This seems to suggest that "sex" is not a meaningful predictor of <dependent variable" (which you could also interpret as no sex differences after controlling for the other variables).

Kind regards,

Johnny

• Dear Johnny,

Thanks very much for the prompt reply. It's all very clear now.

Thanks again!

Best,

Francesca