File size in JASP – What determines the file size of analyses?
Dear JASP community,
I kindly ask, what determines the file size of JASP files when having run and saved analyses? I have managed to reach a file size of 1.15 GB with a relatively small dataset (10 rows, 22 variables).
The analyses contain:
2 Bayesian linear regressions (sampling method BAS default), 2 frequentist linear regressions, 2 PCAs, 2 EFAs, 2 Bayesian correlations, 1 Bayesian paired-samples t-test (student), 2 descriptive statistics, and 1 Bayesian single-test reliability (MCMC sampling, 500 burn-in, 50,000 iterations, thinning 1, no. chains 3, rhat, and traceplots). The method listed at the end, MCMC sampling makes always GBs of file size under these settings, especially the traceplot seems to be take much space when trying to depicting it in another software.
Does anyone know why MCMC samples of 500 burn-in and 500,000 create at least 1 GB or more GBs of data file in JASP whereas they are that easy to handle and safe in R where they are just of a few KBs? It is just an assumption that it is this procedure since it is congruent with other GB-heavy data files that contain the same procedure. Loading and opening the file in JASP also requires significantly more time than all other files that are in the area of a few MB or even KB range. I would be that happy about an answer!
Best regards,
GoldenRetriever
Comments
Hello GoldenRetriever,
This is more of a question for the JASP GitHub page, I think. However, I will forward it to the team. My guess is that JASP saves all of the MCMC samples -- these are needed to plot the figures. It does seem wasteful to retain this information, so perhaps we could ask the user whether he/she wants to save this information.
Cheers,
E.J.
Hey GoldenRetriever,
This sounds like something we should look at!
It would be very helpful if you could open an issue here: https://github.com/jasp-stats/jasp-issues/issues
If you could upload the jasp-file somewhere and link it to me (either via a PM here or better yet: through an issue you make at the above link) then I can check out what went wrong (because it should really not be that big!)
Ah it has been brought to my attention that in fact this issue has been known to us for a while at: https://github.com/jasp-stats/jasp-issues/issues/999
And this will be fixed in 0.15, I've made sure a message is shown in the 0.14 versions of JASP to notify peopl where it matters (in the analysis itself).
Ah well, we are definitely aware of the problem and are working on it. Apparently 0.15 could be optimistic, but if it is we solve it in 0.15.1
What would help us out quite a lot is if you could send us this file.
I am glad that the issue is under work and hopefully solved soon. Those are research data and are not shared due to other circumstances. The issue is well-described and should be easily re-producible with any other datasets, even with randomly generated data. I do not see a need for requesting the specific dataset because it is a general issue and not specific to a dataset.
I agree with EJ comment in subsequent. When exporting/copying the plot file most text editors struggle with displaying and loading the plot. It seems each part of the plot is generated which is why also the image file has a file size that is almost not possible to work with yet. I disagree that it is the same issue as cited. I have had a dataset of only 10 rows and even with 10 rows JASP struggles in MCMC modeling as described. The issue 999 differs in description from this here. It is clearly an issue that involves MCMC samples of 500 burn-in and 500,000 iterations independently from rows of the file. Maybe even other MCMC procedures in JASP suffer from that.
Well this more detailed description of what you did should also suffice, thank you for getting back to me.
I'll add this info to our issue so that this can be tested and if we have taken the unnecessary data out or not.
Dear @GoldenRetriever,
I tried to recreate the issue you re describing, and you are right, the jasp files does get way too big. We have managed to alter the code so that the filesize would be reduced quite a bit, however, with 50,000 posterior samples the file is still around 200Mb (for the case you described). We are also working on adding a button that would allow a user to choose not to store some of the posterior samples in the file. This will, however, lead to a reduced computation speed. These changes will then be available in 0.15.
Regarding the issue you were describing about the plots. I have tried to recreate what you describe, but have not managed to find the problem. Can you go into a bit more detail? Are you referring to the traceplots? Do you obtain that plot (outside of jasp) by exporting the plot directly into a file or do you obtain the plot by unzipping the .jasp file and accessing the plot? What kind of file is that plot then? For example, is it a .png file for the omega traceplot?
Thanks for your help,
Julius
Quick update, I have found the issue with the size of the plots. No need for you to explain further. However, it seems as though the size of the plots is something we cannot further reduce. Having implemented the button to avoid saving any samples I managed to reduce the size of the .jasp file from 1.5gb to 40mb. This might be the best we can do at the moment.