Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set activeDeadlineSeconds to max for tasks with notimeouts #4450

Merged
merged 1 commit into from
Jan 13, 2022

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented Jan 4, 2022

Changes

PR #4217 introduced better handling of the resource quota by adding support for activeDeadlineSeconds. activeDeadlineSeconds is calculated based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is set to 0 in case of a 0s timeout but it is outside of the permitted range (1 to maxint32).

This commit is changing the way activeDeadlineSeconds is set such that it is set to MaxInt32 for a task with 0s timeout.

Closes #4435

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Set activeDeadlineSeconds to max. permitted value (MaxInt32) for a task with 0s timeout (no timeouts).
This commit fixes the bug where a task with 0s timeout was failing with out of range error.

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 4, 2022
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 89.3% 89.6% 0.3

pkg/pod/pod.go Outdated Show resolved Hide resolved
@pritidesai
Copy link
Member Author

/test pull-tekton-pipeline-alpha-integration-tests

@vdemeester
Copy link
Member

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun fails with ambiguous message "Invalid value: 0: must be between 1 and 2147483647, inclusive." This is happening because activeDeadlineSeconds is getting set to 0 in case of a 0s timeout but it is outside of the permitted range.

This commit is changing the way activeDeadlineSeconds is set such that it's not set at all for a task with 0s timeout.

I would rather set it to the Max instead of not setting it. Not setting "in some cases" mean some pods from Tekton will appear as Terminating and some will appear as NonTerminating, making quota "handling" very very confusing. This also means we "may" want to disallow having a timeout of 0 ?

PR tektoncd#4217 introduced better handling of the resource quota by adding
support for activeDeadlineSeconds. activeDeadlineSeconds is calculated
based on this formula:

int64(taskRun.GetTimeout(ctx).Seconds() * 1.5)

In case when a timeout on a task is set to 0s i.e. no timeout, the taskrun
fails with ambiguous message "Invalid value: 0: must be between 1 and
2147483647, inclusive." This is happening because activeDeadlineSeconds is
set to 0 in case of a 0s timeout but in this case activeDeadlineSeconds
is assigned a value out of the permitted range (1 to maxint32).

This commit is changing the way activeDeadlineSeconds is set such that it is
set to MaxInt32 for a task with 0s timeout.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 89.3% 89.5% 0.2

@pritidesai
Copy link
Member Author

thank you @vdemeester 👍 Setting it to the max for no timeout tasks sounds resonable. The max is high enough (MaxInt32) for such tasks. I have changed it accordingly.

Yup I agree, it could be confusing to have some of the pods as terminating vs nonterminating. I am not sure though to completely disallowing timeout of 0s. There might be use cases for such no timeout tasks 🤔 . Its up to the user to select no timeout if it applies to their pipeline. As long as no timeout is explicit, I think it's reasonable to allow specifying it.

@pritidesai pritidesai added this to the Pipelines v0.32 milestone Jan 6, 2022
@ghost
Copy link

ghost commented Jan 11, 2022

Thanks @pritidesai !

/lgtm

@tekton-robot tekton-robot assigned ghost Jan 11, 2022
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2022
@vdemeester
Copy link
Member

@pritidesai lgtm, we just need to change the name of the PR (for "history")

@pritidesai pritidesai changed the title avoid setting activeDeadlineSeconds for notimeouts set activeDeadlineSeconds to max for tasks with notimeouts Jan 12, 2022
@pritidesai
Copy link
Member Author

@pritidesai lgtm, we just need to change the name of the PR (for "history")

Done, sorry for the delay 🙏

Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pritidesai! 😸

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2022
@tekton-robot tekton-robot merged commit f4c939c into tektoncd:main Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timeout can set to 0s but failed during taskrun
5 participants