Support activeDeadlineSeconds for Tekton pods 🦌 #4217

vdemeester · 2021-09-06T14:58:20Z

Changes

Kubernetes (and OpenShift) mark Pod as either Terminating — has a
"relatively" short life and will terminate at some point — and
NonTerminating — is supposed to run for ever. Kubernetes does the
difference between the two using the ActiveDeadlineSeconds field. A
Pod with activeDeadlineSeconds set will be considered as
Terminating. For example, Job's Pod have this field set and are
considered as Terminated.

Currently the pods created by tekton fall under the NonTerminating
quota limits of Kubernetes and OpenShift. This can create issues as
generally builds should fall under the separate terminating quota
limits.

This sets the activeDeadlineSeconds field or TaskRun's Pod so that
they are considered Terminating. It also sets the value of this field
to a higher value than the specified (or default) Timeout set on the
TaskRun so that Kubernetes won't try to terminate the Pod before
Pipeline does.

Signed-off-by: Vincent Demeester [email protected]

/kind feature
/cc @sbwsg @bobcatfish @mattmoor @pritidesai @jerop

Tentatively adding the 0.28 milestone 🙃

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Docs included if any changes are user facing
Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Set `activeDeadlineSeconds` on Tekton's Pod so that they are considered Terminating for Kubernetes. 
This helps supporting ResourceQuota a bit better as now Tekton Pipeline's pod are considered terminating and thus can be using a specific scoped ResourceQuota for those.

tekton-robot · 2021-09-06T15:00:51Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/pod/pod.go	86.9%	87.0%	0.1

Kubernetes (and OpenShift) mark `Pod` as either Terminating — has a "relatively" short life and will terminate at some point — and NonTerminating — is supposed to run for ever. Kubernetes does the difference between the two using the `ActiveDeadlineSeconds` field. A `Pod` with `activeDeadlineSeconds` set will be considered as Terminating. For example, `Job`'s `Pod` have this field set and are considered as Terminated. Currently the pods created by tekton fall under the NonTerminating quota limits of Kubernetes and OpenShift. This can create issues as generally builds should fall under the separate terminating quota limits. This sets the `activeDeadlineSeconds` field or TaskRun's `Pod` so that they are considered Terminating. It also sets the value of this field to a higher value than the specified (or default) Timeout set on the TaskRun so that Kubernetes won't try to terminate the `Pod` before Pipeline does. Signed-off-by: Vincent Demeester <[email protected]>

tekton-robot · 2021-09-06T15:24:12Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/pod/pod.go	86.9%	87.0%	0.1

imjasonh · 2021-09-06T16:03:56Z

Nice. This also has the effect that pods won't run forever if the controller happens to be unavailable or late to process timeouts.

vdemeester · 2021-09-06T16:19:01Z

Nice. This also has the effect that pods won't run forever if the controller happens to be unavailable or late to process timeouts.

Indeed 😛

dlorenc · 2021-09-07T14:06:25Z

/lgtm

vdemeester · 2021-09-08T08:08:51Z

/test pull-tekton-integration-tests

afrittoli · 2021-09-13T09:41:45Z

I imagine the active deadline seconds are counted from when the pod is scheduled?

It might be nice to add a note in the docs about this, but it's ok as a follow-up pr if you prefer.

/approve

tekton-robot · 2021-09-13T09:41:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [afrittoli]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

PR tektoncd#4217 introduced better handline of the resource quota by adding support for activeDeadlineSeconds. activeDeadlineSeconds is calculated based on this formula: int64(taskRun.GetTimeout(ctx).Seconds() * 1.5) In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is getting set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds is getting set to a value out of the permitted range. This commit is changing the way activeDeadlineSeconds is set such that its not set at all for a task with 0s timeout.

PR tektoncd#4217 introduced better handling of the resource quota by adding support for activeDeadlineSeconds. activeDeadlineSeconds is calculated based on this formula: int64(taskRun.GetTimeout(ctx).Seconds() * 1.5) In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds is assigned a value out of the permitted range (1 to maxint32). This commit is changing the way activeDeadlineSeconds is set such that it is set to MaxInt32 for a task with 0s timeout.

PR #4217 introduced better handling of the resource quota by adding support for activeDeadlineSeconds. activeDeadlineSeconds is calculated based on this formula: int64(taskRun.GetTimeout(ctx).Seconds() * 1.5) In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds is assigned a value out of the permitted range (1 to maxint32). This commit is changing the way activeDeadlineSeconds is set such that it is set to MaxInt32 for a task with 0s timeout.

PR tektoncd#4217 introduced better handling of the resource quota by adding support for activeDeadlineSeconds. activeDeadlineSeconds is calculated based on this formula: int64(taskRun.GetTimeout(ctx).Seconds() * 1.5) In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds is assigned a value out of the permitted range (1 to maxint32). This commit is changing the way activeDeadlineSeconds is set such that it is set to MaxInt32 for a task with 0s timeout.

tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 6, 2021

tekton-robot requested review from bobcatfish, jerop, mattmoor, pritidesai and a user September 6, 2021 14:58

vdemeester added this to the Pipelines v0.28 milestone Sep 6, 2021

tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 6, 2021

vdemeester force-pushed the activeDeadlineSeconds branch from ffa410d to 8c2a6ac Compare September 6, 2021 15:21

tekton-robot assigned dlorenc Sep 7, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 7, 2021

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2021

tekton-robot merged commit 39e50e0 into tektoncd:main Sep 13, 2021

vdemeester deleted the activeDeadlineSeconds branch September 13, 2021 10:17

pritidesai mentioned this pull request Jan 4, 2022

set activeDeadlineSeconds to max for tasks with notimeouts #4450

Merged

5 tasks

tlawrie mentioned this pull request May 9, 2022

Confirm timeout still works for Task Time Out boomerang-io/community#347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support activeDeadlineSeconds for Tekton pods 🦌 #4217

Support activeDeadlineSeconds for Tekton pods 🦌 #4217

vdemeester commented Sep 6, 2021

tekton-robot commented Sep 6, 2021

tekton-robot commented Sep 6, 2021

imjasonh commented Sep 6, 2021

vdemeester commented Sep 6, 2021

dlorenc commented Sep 7, 2021

vdemeester commented Sep 8, 2021

afrittoli commented Sep 13, 2021

tekton-robot commented Sep 13, 2021

Support activeDeadlineSeconds for Tekton pods 🦌 #4217

Support activeDeadlineSeconds for Tekton pods 🦌 #4217

Conversation

vdemeester commented Sep 6, 2021

Changes

Submitter Checklist

Release Notes

tekton-robot commented Sep 6, 2021

tekton-robot commented Sep 6, 2021

imjasonh commented Sep 6, 2021

vdemeester commented Sep 6, 2021

dlorenc commented Sep 7, 2021

vdemeester commented Sep 8, 2021

afrittoli commented Sep 13, 2021

tekton-robot commented Sep 13, 2021