-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nomad 1.3.0-rc.1 jobs hang/wont restart. cgroups v2? #12863
Comments
Thanks for testing this out and reporting @badalex! Indeed I can reproduce this given your job file, and should be able to figure out what's going on from here. |
Sweet, I can confirm that PR fixes the restart issue. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.3.0-rc.1 (31b0a18)
Operating system and Environment details
Ubuntu 22.04 Jammy Jellyfish 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 04:55:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Issue
Seems like something wonky happens with cgroup v2 support. If I create a job that exits immediately, it stops being restarted. Nomad 1.2.3 (the last version I can use because of the plugin breakage #12071) seems to work fine, although I plan on downgrading nomad again to double check.
Given the job file included:
nomad ui for the allocation shows:
It is currently 18:26, no other restart attempts have been made. the logmon process for the alloc is still running, no processes underneath that or using the allocation dir according to lsof -n +D
If I change the constraint to a ubuntu 20.04 host, it restarts every secondish as expected.
Other issues I have not been able to reproduce with any success:
Also, Might be a bug in with the job, but .. /dev/null seems to disappear. edit: somtimes, for some jobs, but not all the time, this is how I noticed restarts were not, err, restarting. Trying to debug this issue I'm still working to nail this down, feels like it might be related. This is a raw_exec job that make their own restricted mount namespace, it includes /dev/null is and it is writable. Seems to work fine on nomad 1.2.3 on the same host
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: