Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some timeout refactoring #3011

Merged
merged 2 commits into from
Jul 25, 2020

Conversation

bobcatfish
Copy link
Collaborator

Changes

Add more details about how the timeout handling works 🕒

While investigating #2905, I struggled to understand how the timeout
handling works, especially with TimeoutSet having very little comments,
so I've added some. I didn't add anything for backoffs yet because I'm
hoping we can separate that into a separate structure since it has a
very specific purpose that doesn't generalize to all timeouts.

Also changed the name "finished" to consistently use "done" so the
reader doesn't have to wonder about the difference between "finished"
and "done" (there isn't one)

Move timeout handler into its own package 📦

I'd like to move the "backoff" logic into its own file, separate from
the other timeout logic, so it's clear which parts apply to what (i.e.
the timeout handler is being used for 2 purposes: timing out Runs which
take too long, and backing off when pod creation is failing - this is
totally fine but it's hard to understand when reading the code)

As a first step, I've moved the timeout handler into a separate package,
so we can have a file and tests dedicated to the backoff logic separate
from the other handling.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • [n/a] Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

NONE

While investigating tektoncd#2905, I struggled to understand how the timeout
handling works, especially with TimeoutSet having very little comments,
so I've added some. I didn't add anything for backoffs yet because I'm
hoping we can separate that into a separate structure since it has a
very specific purpose that doesn't generalize to all timeouts.

Also changed the name "finished" to consistently use "done" so the
reader doesn't have to wonder about the difference between "finished"
and "done" (there isn't one)
@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Jul 24, 2020
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 24, 2020
@bobcatfish bobcatfish added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jul 24, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/timeout/handler.go Do not exist 77.4%

I'd like to move the "backoff" logic into its own file, separate from
the other timeout logic, so it's clear which parts apply to what (i.e.
the timeout handler is being used for 2 purposes: timing out Runs which
take too long, and backing off when pod creation is failing - this is
totally fine but it's hard to understand when reading the code)

As a first step, I've moved the timeout handler into a separate package,
so we can have a file and tests dedicated to the backoff logic separate
from the other handling.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/timeout/handler.go Do not exist 77.4%

@@ -124,7 +124,7 @@ var (
_ pipelinerunreconciler.Interface = (*Reconciler)(nil)
)

// Reconcile compares the actual state with the desired, and attempts to
// ReconcileKind compares the actual state with the desired, and attempts to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch :)

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlorenc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2020
// This is usually set to the function that enqueues the taskRun for reconciling.
taskRunCallbackFunc func(interface{})
// pipelineRunCallbackFunc is the function to call when a TaskRun has timed out
// This is usually set to the function that enqueues the taskRun for reconciling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
s/when a TaskRun/when a PipelineRun/
s/enqueues the taskRun/enqueues the pipelineRun/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, thanks @pritidesai !

@@ -274,7 +286,7 @@ func (t *TimeoutSet) waitRun(runObj StatusKey, timeout time.Duration, startTime
// the lifetime of the TaskRun no resources are released after the timer
// fires. It is the caller's responsibility to Release() the TaskRun when
// work with it has completed.
func (t *TimeoutSet) SetTaskRunTimer(tr *v1beta1.TaskRun, d time.Duration) {
func (t *Handler) SetTaskRunTimer(tr *v1beta1.TaskRun, d time.Duration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought, SetTaskRunTimer but no SetPipelineRunTimer 🤔

@@ -186,7 +198,7 @@ func (t *TimeoutSet) checkPipelineRunTimeouts(namespace string, pipelineclientse

// CheckTimeouts function iterates through a given namespace or all namespaces
// (if empty string) and calls corresponding taskrun/pipelinerun timeout functions
func (t *TimeoutSet) CheckTimeouts(namespace string, kubeclientset kubernetes.Interface, pipelineclientset clientset.Interface) {
func (t *Handler) CheckTimeouts(namespace string, kubeclientset kubernetes.Interface, pipelineclientset clientset.Interface) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its interesting that we are scrapping through all possible namespaces (if not specified) and checking all TaskRuns and PipelineRuns in those namespaces or at least in one specified namespace 😲

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same check CheckTimeouts is done in TaskRun controller and PipelineRun controller 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah!! the second means we're probably doing this twice as much as we need to 😅

and it sounds like we probably don't need to be doing it at all!! #2905 (comment)

@pritidesai
Copy link
Member

@bobcatfish I learnt some timeout handler with this PR 😜 and excited to see more changes ...

one minor NIT which can be addressed with next set of changes

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2020
@tekton-robot tekton-robot merged commit 172cd19 into tektoncd:master Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants