Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support activeDeadlineSeconds for Tekton pods 🦌 #4217

Merged
merged 1 commit into from
Sep 13, 2021

Conversation

vdemeester
Copy link
Member

Changes

Kubernetes (and OpenShift) mark Pod as either Terminating β€” has a
"relatively" short life and will terminate at some point β€” and
NonTerminating β€” is supposed to run for ever. Kubernetes does the
difference between the two using the ActiveDeadlineSeconds field. A
Pod with activeDeadlineSeconds set will be considered as
Terminating. For example, Job's Pod have this field set and are
considered as Terminated.

Currently the pods created by tekton fall under the NonTerminating
quota limits of Kubernetes and OpenShift. This can create issues as
generally builds should fall under the separate terminating quota
limits.

This sets the activeDeadlineSeconds field or TaskRun's Pod so that
they are considered Terminating. It also sets the value of this field
to a higher value than the specified (or default) Timeout set on the
TaskRun so that Kubernetes won't try to terminate the Pod before
Pipeline does.

Signed-off-by: Vincent Demeester [email protected]

/kind feature
/cc @sbwsg @bobcatfish @mattmoor @pritidesai @jerop

Tentatively adding the 0.28 milestone πŸ™ƒ

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Set `activeDeadlineSeconds` on Tekton's Pod so that they are considered Terminating for Kubernetes. 
This helps supporting ResourceQuota a bit better as now Tekton Pipeline's pod are considered terminating and thus can be using a specific scoped ResourceQuota for those.

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 6, 2021
@vdemeester vdemeester added this to the Pipelines v0.28 milestone Sep 6, 2021
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 6, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 86.9% 87.0% 0.1

Kubernetes (and OpenShift) mark `Pod` as either Terminating β€” has a
"relatively" short life and will terminate at some point β€” and
NonTerminating β€” is supposed to run for ever. Kubernetes does the
difference between the two using the `ActiveDeadlineSeconds` field. A
`Pod` with `activeDeadlineSeconds` set will be considered as
Terminating. For example, `Job`'s `Pod` have this field set and are
considered as Terminated.

Currently the pods created by tekton fall under the NonTerminating
quota limits of Kubernetes and OpenShift. This can create issues as
generally builds should fall under the separate terminating quota
limits.

This sets the `activeDeadlineSeconds` field or TaskRun's `Pod` so that
they are considered Terminating. It also sets the value of this field
to a higher value than the specified (or default) Timeout set on the
TaskRun so that Kubernetes won't try to terminate the `Pod` before
Pipeline does.

Signed-off-by: Vincent Demeester <[email protected]>
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 86.9% 87.0% 0.1

@imjasonh
Copy link
Member

imjasonh commented Sep 6, 2021

Nice. This also has the effect that pods won't run forever if the controller happens to be unavailable or late to process timeouts.

@vdemeester
Copy link
Member Author

Nice. This also has the effect that pods won't run forever if the controller happens to be unavailable or late to process timeouts.

Indeed πŸ˜›

@dlorenc
Copy link
Contributor

dlorenc commented Sep 7, 2021

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 7, 2021
@vdemeester
Copy link
Member Author

/test pull-tekton-integration-tests

@afrittoli
Copy link
Member

I imagine the active deadline seconds are counted from when the pod is scheduled?

It might be nice to add a note in the docs about this, but it's ok as a follow-up pr if you prefer.

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2021
@tekton-robot tekton-robot merged commit 39e50e0 into tektoncd:main Sep 13, 2021
@vdemeester vdemeester deleted the activeDeadlineSeconds branch September 13, 2021 10:17
pritidesai added a commit to pritidesai/pipeline that referenced this pull request Jan 4, 2022
PR tektoncd#4217 introduced better handline of the resource quota by adding
support for activeDeadlineSeconds. activeDeadlineSeconds is calculated
based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun
fails with ambiguous message "Invalid value: 0: must be between 1 and
2147483647, inclusive." This is happening because activeDeadlineSeconds is
getting set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds
is getting set to a value out of the permitted range.

This commit is changing the way activeDeadlineSeconds is set such that its not
set at all for a task with 0s timeout.
pritidesai added a commit to pritidesai/pipeline that referenced this pull request Jan 5, 2022
PR tektoncd#4217 introduced better handling of the resource quota by adding
support for activeDeadlineSeconds. activeDeadlineSeconds is calculated
based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun
fails with ambiguous message "Invalid value: 0: must be between 1 and
2147483647, inclusive." This is happening because activeDeadlineSeconds is
set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds
is assigned a value out of the permitted range (1 to maxint32).

This commit is changing the way activeDeadlineSeconds is set such that it is
set to MaxInt32 for a task with 0s timeout.
tekton-robot pushed a commit that referenced this pull request Jan 13, 2022
PR #4217 introduced better handling of the resource quota by adding
support for activeDeadlineSeconds. activeDeadlineSeconds is calculated
based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun
fails with ambiguous message "Invalid value: 0: must be between 1 and
2147483647, inclusive." This is happening because activeDeadlineSeconds is
set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds
is assigned a value out of the permitted range (1 to maxint32).

This commit is changing the way activeDeadlineSeconds is set such that it is
set to MaxInt32 for a task with 0s timeout.
khrm pushed a commit to openshift/tektoncd-pipeline that referenced this pull request May 19, 2022
PR tektoncd#4217 introduced better handling of the resource quota by adding
support for activeDeadlineSeconds. activeDeadlineSeconds is calculated
based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun
fails with ambiguous message "Invalid value: 0: must be between 1 and
2147483647, inclusive." This is happening because activeDeadlineSeconds is
set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds
is assigned a value out of the permitted range (1 to maxint32).

This commit is changing the way activeDeadlineSeconds is set such that it is
set to MaxInt32 for a task with 0s timeout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants