jatos 3.7.4 with nginx 502 Bad Gateway
We've got a new installation of jatos 3.7.4 on Ubuntu 20.04 with nginx and we keep getting intermittent but recurring 502 Bad Gateway errors.
I can usually get to the Bad Gateway if I log in as admin, go to the users tab, then back to the admin tab, and go back and forth a few times, although the error occurs at other times and for other users as well.
We were getting them a lot, but when I increased some timeouts in both nginx and jatos, and added some memory to the virtual machine, they stopped occurring so regularly but still occur.
We have an older jatos installation - 3.3.3 on ubuntu 16 - which wasn't exhibiting this behaviour.
We're kind of at our wits' end here, at the minute. Any ideas or pointers will be gratefully accepted.
Jeff Berry, MRC Cognition and Brain Sciences Unit
Comments
Hi Jeff,
This '502 Bad Gateway' is from nginx telling you that it doesn't get an response from JATOS within the time that is configured as timeout in nginx.conf. I guess you have a lot of users? Then it can take some time and server resources to get the data for the 'User Manager' page. And if you go back and forth it gets requested multiple times and your server might get overloaded. This is even worse for the Adminstration/Studies page. But it should be only short-term. The 3.3.3 version did not have the Administration page and thus had no chance to load this resource hungry pages.
Does this 502 error occur on other occasions too? If this 502 occurs somewhere else I would be more concerned, but this is for administration and does not affect running studies. And, to be on the safe side, can you check JATOS logs if there are errors or exceptions?
But if you want to do something you can (like you did already) increase nginx timeout and give the server more memory. And monitor the resources on the server and find out what the bottleneck is: maybe it's more of a CPU or disk space problem. I assume nginx is on the same machine as JATOS and therefore network should not be the culprit here.
Best,
Kristian
Hi Kristian,
At the minute we have one user and no active studies, and the timeouts are set to what I think of as ridiculously high values - 75s on both jatos and nginx. We are running nginx and JATOS on the same machine, and the mysql, too, for that matter.
The machine is still in test, with one user (and the local admin account), a couple of test studies, and that's it. There shouldn't be any issue with CPU or disk, which is what makes this so baffling.
The only error in the application.log is this:
[ERROR] - c.g.Updates - Couldn't request latest JATOS update info.
although it does complain about certificates, since I'm also trying to get ldap set up.
In the loader.log, there's
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/opt/jatos/jatos-3.6.1/lib/com.google.inject.guice-4.2.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
Thanks for the thoughts,
Jeff
Follow-up -
it looks like the problem was with the loader.sh script's interaction with systemd - my colleague realised that the start function didn't background the jatos process, so it never exited, so systemd kept trying to restart it - hence the intermittent bad gateways as the process was respawning.
Best,
Jeff
Hi Jeff,
Great you found the problem. And now I'm curious: How did the systemd config looked like to cause this constant restart?
Best,
Kristian
systemd calls loader.sh - and the start function in loader.sh didn't have an ampersand to start it in the background. So loader.sh never terminated which means systemd didn't think it was running properly and tried to restart. Adding & after the jatos call in loader.sh resolved the issue.
That is change this:
# Start JATOS with configuration file, application secret, address, port, and pass on other arguments
"$dir/bin/jatos" "${args[@]}" -J-server 2>>"$dir/logs/loader.log"
to this:
# Start JATOS with configuration file, application secret, address, port, and pass on other arguments
"$dir/bin/jatos" "${args[@]}" -J-server 2>>"$dir/logs/loader.log" &
Best,
Jeff
Hi Jeff,
How does your systemd service config look like? If I'm using the one from the docs it works with me without the '&'. I'm just concerned that this additional '&' will interfere with some potential future JATOS upgrades. But nothing serious: just the auto-restart after an upgrade wouldn't work.
Best,
Kristian
Ours is a little different, but not much.
These are the only differences:
ExecStop=/opt/jatos/current/loader.sh stop
rather
After=network-online.target mysql.service
And we've added:
Type=forking
After=network-online.target mysql.service
Otherwise, the same.
Jeff
Just for my understanding, I'm always curious: why
type=forkingtogether with putting it in the background with&instead oftype=simpleand no&? Is there some advantage?Best,
Kristian
I'm not a systemd expert, and my colleague is the one who set this up, for our older jatos instal but ...
I think the idea behind forking is that it has better error detection - simple, I believe, just spawns the process and moves on, so it may be trying to do something else before the process completes. jatos takes a while to spin up, since the vm has to start, so if something downstream needs jatos running, simple might not guarantee that jatos is up. (My understanding may well be flawed.)
Looking at it - I think we may have had a race condition with mysql. (That's a guess.) Although that should be addressed by the After=network-online.target mysql.service stanza.
Our old version of jatos is 3.3.3 - and it looks like the loader.sh which shipped with that has the & to background the process.
Best,
JB
I understand. Just be aware that the restart when doing an JATOS upgrade might not work and you have to restart manually.
Best,
Kristian