Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroups: make sure cgroup still exists after task restart #12875

Merged
merged 1 commit into from
May 5, 2022

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented May 4, 2022

This PR modifies raw_exec and exec to ensure the cgroup for a task
they are driving still exists during a task restart. These drivers
have the same bug but with different root cause.

For raw_exec, we were removing the cgroup in 2 places - the cpuset
manager, and in the unix containment implementation (the thing that
uses freezer cgroup to clean house). During a task restart, the
containment would remove the cgroup, and when the task runner hooks
went to start again would block on waiting for the cgroup to exist,
which will never happen, because it gets created by the cpuset manager
which only runs as an alloc pre-start hook. The fix here is to simply
not delete the cgroup in the containment implementation; killing the
PIDs is enough. The removal happens in the cpuset manager later anyway.

For exec, it's the same idea, except DestroyTask is called on task
failure, which in turn calls into libcontainer, which in turn deletes
the cgroup. In this case we do not have control over the deletion of
the cgroup, so instead we hack the cgroup back into life after the
call to DestroyTask.

All of this only applies to cgroups v2.

Fixes #12863

No CL because cgroupsv2 hasn't shipped yet.

@shoenig
Copy link
Member Author

shoenig commented May 4, 2022

repro example

job "restarts" {
  datacenters = ["dc1"]
  type = "service"

  group "g1" {
    restart {
      attempts = 5
      mode = "delay"
      delay = "1s"
      interval = "5s"
    }

    task "t1-raw_exec" {
      driver = "raw_exec"
      config {
	command = "/usr/bin/bash"
	args = ["-c", "sleep 10"]
      }
    }

    task "t2-exec" {
      driver = "exec"
      config {
	command = "/usr/bin/bash"
	args = ["-c", "sleep 10"]
      }
    }    
  }
}

with fix:

Task "t1-raw_exec" is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  42 MiB/300 MiB  300 MiB  

Task Events:
Started At     = 2022-05-04T19:01:57Z
Finished At    = N/A
Total Restarts = 5
Last Restart   = 2022-05-04T14:01:57-05:00

Recent Events:
Time                       Type        Description
2022-05-04T14:01:57-05:00  Started     Task started by client
2022-05-04T14:01:57-05:00  Restarting  Task restarting in 1.144459209s
2022-05-04T14:01:57-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:47-05:00  Started     Task started by client
2022-05-04T14:01:47-05:00  Restarting  Task restarting in 1.08660356s
2022-05-04T14:01:47-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:37-05:00  Started     Task started by client
2022-05-04T14:01:37-05:00  Restarting  Task restarting in 1.23187761s
2022-05-04T14:01:37-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:27-05:00  Started     Task started by client

Task "t2-exec" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  224 KiB/300 MiB  300 MiB  

Task Events:
Started At     = 2022-05-04T19:01:50Z
Finished At    = N/A
Total Restarts = 4
Last Restart   = 2022-05-04T14:01:49-05:00

Recent Events:
Time                       Type        Description
2022-05-04T14:01:50-05:00  Started     Task started by client
2022-05-04T14:01:49-05:00  Restarting  Task restarting in 1.08660356s
2022-05-04T14:01:49-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:39-05:00  Started     Task started by client
2022-05-04T14:01:39-05:00  Restarting  Task restarting in 1.23187761s
2022-05-04T14:01:39-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:29-05:00  Started     Task started by client
2022-05-04T14:01:29-05:00  Restarting  Task restarting in 1.042526215s
2022-05-04T14:01:29-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:19-05:00  Started     Task started by client

@shoenig
Copy link
Member Author

shoenig commented May 5, 2022

Running relevant tests locally on a cgroups v2 machine

➜ mount -l | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

test-cgutil.log
test-exec.log
test-raw_exec.log
test-taskrunner.log

This PR modifies raw_exec and exec to ensure the cgroup for a task
they are driving still exists during a task restart. These drivers
have the same bug but with different root cause.

For raw_exec, we were removing the cgroup in 2 places - the cpuset
manager, and in the unix containment implementation (the thing that
uses freezer cgroup to clean house). During a task restart, the
containment would remove the cgroup, and when the task runner hooks
went to start again would block on waiting for the cgroup to exist,
which will never happen, because it gets created by the cpuset manager
which only runs as an alloc pre-start hook. The fix here is to simply
not delete the cgroup in the containment implementation; killing the
PIDs is enough. The removal happens in the cpuset manager later anyway.

For exec, it's the same idea, except DestroyTask is called on task
failure, which in turn calls into libcontainer, which in turn deletes
the cgroup. In this case we do not have control over the deletion of
the cgroup, so instead we hack the cgroup back into life after the
call to DestroyTask.

All of this only applies to cgroups v2.
@shoenig shoenig force-pushed the b-cgroupsv2-task-restarts branch from e5e3e99 to 37ffd2f Compare May 5, 2022 14:51
@shoenig
Copy link
Member Author

shoenig commented May 5, 2022

Will follow up with CI changes to run on ubuntu-22.04 in GHA
actions/runner-images#5490

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Does the most recent commit cover #12877 as well do you think or is that going to need to be a separate investigation?

@shoenig
Copy link
Member Author

shoenig commented May 5, 2022

#12877

Let's keep it open; so far I haven't been able to reproduce but my laptop is on an older kernel

@shoenig shoenig merged commit 7c91ac0 into main May 5, 2022
@shoenig shoenig deleted the b-cgroupsv2-task-restarts branch May 5, 2022 15:54
@shoenig shoenig added the backport/1.3.x backport to 1.3.x release line label May 5, 2022
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nomad 1.3.0-rc.1 jobs hang/wont restart. cgroups v2?
2 participants