Design: Partial Pipeline execution #50

bobcatfish · 2018-09-16T21:50:30Z

The work for this task is to design this feature and present one or more proposals (before implementing).

Expected Behavior

If a pipeline has many tasks and takes a long time to run (e.g. tens of minutes, or even hours), and one Task fails, it might be desirable to be able to pick up execution where the Task failed, with different PipelineParams (e.g. from a different git commit), so you can resume the Pipeline without having to rerun the whole thing.

Some ideas for how to implement this:

Fields in a PipelineRun which override which Tasks to run from / refer to a previous PipelineRun from which results should be taken
A tool which makes it easy to create a new Pipeline from an existing one which only runs a subset of the Tasks

It is also worth considering what this could be like via a UI: if one is viewing a Pipeline in a UI, and wants to re-run only a portion of the Pipeline, they probably want the user experience to be as if they were still running the same Pipeline, even if underneath a new Pipeline is created.

Actual Behavior

At the moment, if any Task in a Pipeline fails, your options to rerun the rest of the Pipeline would be:

Run the entire Pipeline again
Create a new Pipeline from the previous one which contains only the Tasks you wish to run

Additional Info

This originally came up in discussion about #39, in the context of whether or not we'd want to always use the same git commit from a source for all Tasks in a Pipeline, or if we wanted sometimes for a Task to always use HEAD. This would allow a user to change a repo, by updating HEAD, between Task executions.

The feature of partial pipeline execution could be an alternative to this.

bobcatfish · 2018-09-19T20:58:49Z

@BenTheElder, @cjwagner and some other Prow folks indicated that this would be a very desirable feature for them - particularly in a case where your pipeline has 2 phases, one that builds a bunch of stuff and then subsequent phases that use that built stuff, it'd be handy to be able to resume after the point where the stuff is built

gsaslis · 2020-04-15T20:02:53Z

Just to add (or rather try to help clarify) a use case here.

This is a very useful feature for long-running pipelines that probably fall outside the strict CI scope. Most pipelines I have in mind are essentially workflow automation pipelines and have external dependencies such as 3rd party systems that need to be up / reachable.

When such a pipeline fails at step 7/11, you really don't want to rerun the whole thing. The Jenkins Restart from stage feature is ideal for the pipeline to essentially pick up where it left off.

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines, significantly widening the scope of problems tekton pipelines can be used to solve. (Plus anyone who relies on this on Jenkins will find it easier to migrate to Tekton).

As a final point, I would like to clarify that in my use case, support for restarting with "different PipelineParams" (as mentioned in the description) is not a necessary feature. I am sure people have use cases for that too, but I personally like the approach Jenkins takes here: you can either restart the whole pipeline with different params (new pipeline run), or restart from stage, when it failed, always with the same params (retry failed pipeline run, starting from failed task).

Hope this helps.

tekton-robot · 2020-08-14T00:34:17Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T00:34:17Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T00:34:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T00:34:19Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester · 2020-08-17T09:41:58Z

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

tekton-robot · 2020-08-17T09:42:00Z

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bobcatfish · 2020-08-17T15:13:06Z

This one is on our roadmap: https:/tektoncd/pipeline/blob/master/roadmap.md

/lifecycle frozen

coryrc · 2020-09-01T18:27:47Z

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines

I think where this falls apart is lacking copy-on-write workspaces. Tasks can easily introduce changes to workspaces which makes execution non-idempotent. If instead workspaces were always inputs xor outputs the input to any given stage would still exist when the retry is attempted.

coryrc · 2020-09-01T18:50:59Z

input/output workspace layering cannot be done efficiently without copy-on-write workspaces. It could be done with an NFS server and overlay filesystems, or there appear to be some COW volumes but I do not know enough about k8s to say if it's usable for this.

bobcatfish · 2020-09-02T21:35:15Z

Tasks can easily introduce changes to workspaces which makes execution non-idempotent

I think there will be use cases where folks want to make these kinds of non-idempotent changes to workspaces, so even with COW I'm not sure we could fully solve this problem? If I'm wrong it would probably help if you could explain with an example.

Also: it sounds like COW workspaces would be an interesting feature in general - if you feel motivated it'd be great to have a separate issue to dive into this in detail

bobcatfish · 2021-01-06T16:36:48Z

Quick update here: @jerop created a design for #1797 which has some interesting ideas that could be applied to a design for partial execution (design doc).

pritidesai · 2022-09-20T18:13:58Z

TEP-0123 Specifying on-demand-retry in a pipelineTask does not offer solution for this feature. But proposes a feature to allow specifying on-demand-retry at the authoring time.

jwx0925 · 2023-12-22T09:25:09Z

Can Tekton Pipeline provide the functionality to rerun failed tasks in a pipeline? This would be very useful for our scenario, where a complex pipeline fails on the last task, requiring manual intervention to manually rerun the failed task. GitHub Actions has a similar feature.

bobcatfish added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 16, 2018

bobcatfish mentioned this issue Sep 16, 2018

Initial design ideas for Source #39

Merged

bobcatfish added the design This task is about creating and discussing a design label Oct 12, 2018

imjasonh mentioned this issue Oct 20, 2018

Using service for build knative/build#436

Open

dibyom mentioned this issue Feb 7, 2020

Conditional build of subproject within a monorepo depending on modified files/directories #1922

Closed

pierretasci mentioned this issue Apr 17, 2020

Feature: Pipeline Checkpointing #2433

Closed

tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 14, 2020

tekton-robot closed this as completed Aug 14, 2020

tekton-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 14, 2020

tekton-robot reopened this Aug 17, 2020

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2020

tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2020

bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Aug 24, 2020

jerop added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/api Indicates an issue or PR that deals with the API. labels Apr 20, 2021

bobcatfish mentioned this issue May 21, 2021

TEP-0065: Retry failed tasks on demand in a pipeline tektoncd/community#422

Closed

afrittoli added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 17, 2021

bobcatfish mentioned this issue Aug 16, 2021

TEP-0077: Partial pipeline execute. tektoncd/community#484

Closed

AlanGreene mentioned this issue Mar 28, 2023

Support for re-try option in the pipeline dashboard tektoncd/dashboard#2826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: Partial Pipeline execution #50

Design: Partial Pipeline execution #50

bobcatfish commented Sep 16, 2018

bobcatfish commented Sep 19, 2018

gsaslis commented Apr 15, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

vdemeester commented Aug 17, 2020

tekton-robot commented Aug 17, 2020

bobcatfish commented Aug 17, 2020

coryrc commented Sep 1, 2020

coryrc commented Sep 1, 2020

bobcatfish commented Sep 2, 2020

bobcatfish commented Jan 6, 2021

pritidesai commented Sep 20, 2022

jwx0925 commented Dec 22, 2023

Design: Partial Pipeline execution #50

Design: Partial Pipeline execution #50

Comments

bobcatfish commented Sep 16, 2018

Expected Behavior

Actual Behavior

Additional Info

bobcatfish commented Sep 19, 2018

gsaslis commented Apr 15, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

vdemeester commented Aug 17, 2020

tekton-robot commented Aug 17, 2020

bobcatfish commented Aug 17, 2020

coryrc commented Sep 1, 2020

coryrc commented Sep 1, 2020

bobcatfish commented Sep 2, 2020

bobcatfish commented Jan 6, 2021

pritidesai commented Sep 20, 2022

jwx0925 commented Dec 22, 2023