Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: Partial Pipeline execution #50

Open
bobcatfish opened this issue Sep 16, 2018 · 15 comments
Open

Design: Partial Pipeline execution #50

bobcatfish opened this issue Sep 16, 2018 · 15 comments
Labels
area/api Indicates an issue or PR that deals with the API. area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@bobcatfish
Copy link
Collaborator

The work for this task is to design this feature and present one or more proposals (before implementing).

Expected Behavior

If a pipeline has many tasks and takes a long time to run (e.g. tens of minutes, or even hours), and one Task fails, it might be desirable to be able to pick up execution where the Task failed, with different PipelineParams (e.g. from a different git commit), so you can resume the Pipeline without having to rerun the whole thing.

Some ideas for how to implement this:

  1. Fields in a PipelineRun which override which Tasks to run from / refer to a previous PipelineRun from which results should be taken
  2. A tool which makes it easy to create a new Pipeline from an existing one which only runs a subset of the Tasks

It is also worth considering what this could be like via a UI: if one is viewing a Pipeline in a UI, and wants to re-run only a portion of the Pipeline, they probably want the user experience to be as if they were still running the same Pipeline, even if underneath a new Pipeline is created.

Actual Behavior

At the moment, if any Task in a Pipeline fails, your options to rerun the rest of the Pipeline would be:

  1. Run the entire Pipeline again
  2. Create a new Pipeline from the previous one which contains only the Tasks you wish to run

Additional Info

This originally came up in discussion about #39, in the context of whether or not we'd want to always use the same git commit from a source for all Tasks in a Pipeline, or if we wanted sometimes for a Task to always use HEAD. This would allow a user to change a repo, by updating HEAD, between Task executions.

The feature of partial pipeline execution could be an alternative to this.

@bobcatfish bobcatfish added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 16, 2018
@bobcatfish
Copy link
Collaborator Author

@BenTheElder, @cjwagner and some other Prow folks indicated that this would be a very desirable feature for them - particularly in a case where your pipeline has 2 phases, one that builds a bunch of stuff and then subsequent phases that use that built stuff, it'd be handy to be able to resume after the point where the stuff is built

@gsaslis
Copy link

gsaslis commented Apr 15, 2020

Just to add (or rather try to help clarify) a use case here.

This is a very useful feature for long-running pipelines that probably fall outside the strict CI scope. Most pipelines I have in mind are essentially workflow automation pipelines and have external dependencies such as 3rd party systems that need to be up / reachable.

When such a pipeline fails at step 7/11, you really don't want to rerun the whole thing. The Jenkins Restart from stage feature is ideal for the pipeline to essentially pick up where it left off.

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines, significantly widening the scope of problems tekton pipelines can be used to solve. (Plus anyone who relies on this on Jenkins will find it easier to migrate to Tekton).

As a final point, I would like to clarify that in my use case, support for restarting with "different PipelineParams" (as mentioned in the description) is not a necessary feature. I am sure people have use cases for that too, but I personally like the approach Jenkins takes here: you can either restart the whole pipeline with different params (new pipeline run), or restart from stage, when it failed, always with the same params (retry failed pipeline run, starting from failed task).

Hope this helps.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 14, 2020
@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 14, 2020
@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot
Copy link
Collaborator

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2020
@bobcatfish
Copy link
Collaborator Author

This one is on our roadmap: https:/tektoncd/pipeline/blob/master/roadmap.md

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2020
@bobcatfish bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Aug 24, 2020
@coryrc
Copy link

coryrc commented Sep 1, 2020

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines

I think where this falls apart is lacking copy-on-write workspaces. Tasks can easily introduce changes to workspaces which makes execution non-idempotent. If instead workspaces were always inputs xor outputs the input to any given stage would still exist when the retry is attempted.

@coryrc
Copy link

coryrc commented Sep 1, 2020

input/output workspace layering cannot be done efficiently without copy-on-write workspaces. It could be done with an NFS server and overlay filesystems, or there appear to be some COW volumes but I do not know enough about k8s to say if it's usable for this.

@bobcatfish
Copy link
Collaborator Author

Tasks can easily introduce changes to workspaces which makes execution non-idempotent

I think there will be use cases where folks want to make these kinds of non-idempotent changes to workspaces, so even with COW I'm not sure we could fully solve this problem? If I'm wrong it would probably help if you could explain with an example.

Also: it sounds like COW workspaces would be an interesting feature in general - if you feel motivated it'd be great to have a separate issue to dive into this in detail

@bobcatfish
Copy link
Collaborator Author

Quick update here: @jerop created a design for #1797 which has some interesting ideas that could be applied to a design for partial execution (design doc).

@jerop jerop added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/api Indicates an issue or PR that deals with the API. labels Apr 20, 2021
@afrittoli afrittoli added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 17, 2021
@pritidesai
Copy link
Member

TEP-0123 Specifying on-demand-retry in a pipelineTask does not offer solution for this feature. But proposes a feature to allow specifying on-demand-retry at the authoring time.

@jwx0925
Copy link

jwx0925 commented Dec 22, 2023

Can Tekton Pipeline provide the functionality to rerun failed tasks in a pipeline? This would be very useful for our scenario, where a complex pipeline fails on the last task, requiring manual intervention to manually rerun the failed task. GitHub Actions has a similar feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue or PR that deals with the API. area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
Status: Todo
Status: Hold
Development

No branches or pull requests

9 participants