Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of cgroups: make sure cgroup still exists after task restart into release/1.3.x #12893

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #12875 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

The below text is copied from the body of the original PR.


This PR modifies raw_exec and exec to ensure the cgroup for a task
they are driving still exists during a task restart. These drivers
have the same bug but with different root cause.

For raw_exec, we were removing the cgroup in 2 places - the cpuset
manager, and in the unix containment implementation (the thing that
uses freezer cgroup to clean house). During a task restart, the
containment would remove the cgroup, and when the task runner hooks
went to start again would block on waiting for the cgroup to exist,
which will never happen, because it gets created by the cpuset manager
which only runs as an alloc pre-start hook. The fix here is to simply
not delete the cgroup in the containment implementation; killing the
PIDs is enough. The removal happens in the cpuset manager later anyway.

For exec, it's the same idea, except DestroyTask is called on task
failure, which in turn calls into libcontainer, which in turn deletes
the cgroup. In this case we do not have control over the deletion of
the cgroup, so instead we hack the cgroup back into life after the
call to DestroyTask.

All of this only applies to cgroups v2.

Fixes #12863

No CL because cgroupsv2 hasn't shipped yet.

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-cgroupsv2-task-restarts/equally-prompt-mammal branch 2 times, most recently from 4f22971 to 2308da4 Compare May 5, 2022 16:02
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit 76f90ec into release/1.3.x May 5, 2022
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/b-cgroupsv2-task-restarts/equally-prompt-mammal branch May 5, 2022 16:02
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants