Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

jatos 3.7.4 with nginx 502 Bad Gateway

We've got a new installation of jatos 3.7.4 on Ubuntu 20.04 with nginx and we keep getting intermittent but recurring 502 Bad Gateway errors.

I can usually get to the Bad Gateway if I log in as admin, go to the users tab, then back to the admin tab, and go back and forth a few times, although the error occurs at other times and for other users as well.

We were getting them a lot, but when I increased some timeouts in both nginx and jatos, and added some memory to the virtual machine, they stopped occurring so regularly but still occur.

We have an older jatos installation - 3.3.3 on ubuntu 16 - which wasn't exhibiting this behaviour.

We're kind of at our wits' end here, at the minute. Any ideas or pointers will be gratefully accepted.


Jeff Berry, MRC Cognition and Brain Sciences Unit

Comments

  • Hi Jeff,

    This '502 Bad Gateway' is from nginx telling you that it doesn't get an response from JATOS within the time that is configured as timeout in nginx.conf. I guess you have a lot of users? Then it can take some time and server resources to get the data for the 'User Manager' page. And if you go back and forth it gets requested multiple times and your server might get overloaded. This is even worse for the Adminstration/Studies page. But it should be only short-term. The 3.3.3 version did not have the Administration page and thus had no chance to load this resource hungry pages.

    Does this 502 error occur on other occasions too? If this 502 occurs somewhere else I would be more concerned, but this is for administration and does not affect running studies. And, to be on the safe side, can you check JATOS logs if there are errors or exceptions?

    But if you want to do something you can (like you did already) increase nginx timeout and give the server more memory. And monitor the resources on the server and find out what the bottleneck is: maybe it's more of a CPU or disk space problem. I assume nginx is on the same machine as JATOS and therefore network should not be the culprit here.

    Best,

    Kristian

  • Hi Kristian,

    At the minute we have one user and no active studies, and the timeouts are set to what I think of as ridiculously high values - 75s on both jatos and nginx. We are running nginx and JATOS on the same machine, and the mysql, too, for that matter.

    The machine is still in test, with one user (and the local admin account), a couple of test studies, and that's it. There shouldn't be any issue with CPU or disk, which is what makes this so baffling.

    The only error in the application.log is this:

    [ERROR] - c.g.Updates - Couldn't request latest JATOS update info.

    although it does complain about certificates, since I'm also trying to get ldap set up.


    In the loader.log, there's

    WARNING: An illegal reflective access operation has occurred

    WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/opt/jatos/jatos-3.6.1/lib/com.google.inject.guice-4.2.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)

    Thanks for the thoughts,


    Jeff

  • Follow-up -


    it looks like the problem was with the loader.sh script's interaction with systemd - my colleague realised that the start function didn't background the jatos process, so it never exited, so systemd kept trying to restart it - hence the intermittent bad gateways as the process was respawning.


    Best,

    Jeff

  • Hi Jeff,

    Great you found the problem. And now I'm curious: How did the systemd config looked like to cause this constant restart?

    Best,

    Kristian

  • systemd calls loader.sh - and the start function in loader.sh didn't have an ampersand to start it in the background. So loader.sh never terminated which means systemd didn't think it was running properly and tried to restart. Adding & after the jatos call in loader.sh resolved the issue.

    That is change this:

     # Start JATOS with configuration file, application secret, address, port, and pass on other arguments

       "$dir/bin/jatos" "${args[@]}" -J-server 2>>"$dir/logs/loader.log"

    to this:

     # Start JATOS with configuration file, application secret, address, port, and pass on other arguments

       "$dir/bin/jatos" "${args[@]}" -J-server 2>>"$dir/logs/loader.log" &


    Best,

    Jeff

  • Hi Jeff,

    How does your systemd service config look like? If I'm using the one from the docs it works with me without the '&'. I'm just concerned that this additional '&' will interfere with some potential future JATOS upgrades. But nothing serious: just the auto-restart after an upgrade wouldn't work.

    Best,

    Kristian

  • Ours is a little different, but not much.

    These are the only differences:

    ExecStop=/opt/jatos/current/loader.sh stop

    rather

    After=network-online.target mysql.service

    And we've added:

    Type=forking

    After=network-online.target mysql.service


    Otherwise, the same.

    Jeff

  • Just for my understanding, I'm always curious: why type=forking together with putting it in the background with & instead of type=simple and no & ? Is there some advantage?

    Best,

    Kristian

  • I'm not a systemd expert, and my colleague is the one who set this up, for our older jatos instal but ...

    I think the idea behind forking is that it has better error detection - simple, I believe, just spawns the process and moves on, so it may be trying to do something else before the process completes. jatos takes a while to spin up, since the vm has to start, so if something downstream needs jatos running, simple might not guarantee that jatos is up. (My understanding may well be flawed.)

    Looking at it - I think we may have had a race condition with mysql. (That's a guess.) Although that should be addressed by the After=network-online.target mysql.service stanza.


    Our old version of jatos is 3.3.3 - and it looks like the loader.sh which shipped with that has the & to background the process.


    Best,

    JB

  • I understand. Just be aware that the restart when doing an JATOS upgrade might not work and you have to restart manually.

    Best,

    Kristian

Sign In or Register to comment.