Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource upload containers should be able to run even if steps in the pod have failed #1149

Closed
abayer opened this issue Aug 2, 2019 · 7 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/duplicate Indicates an issue is a duplicate of other open issue.

Comments

@abayer
Copy link
Contributor

abayer commented Aug 2, 2019

Currently, the step(s) added by GetUploadContainerSpec() won't run if an earlier step fails, since the pod won't run any steps after a failure. This makes things like reporting a build failure via a PullRequestResource or uploading the workspace to a storage resource even if the build fails impossible. This should be configurable, ideally both as something that can be inherent to the resource type (i.e., you're always going to want to run the PullRequestResource upload) and something that can be configured on an individual resource (i.e., you're not always going to want to upload to a storage resource after a failure, but sometimes you might!).

@abayer abayer added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 2, 2019
@abayer
Copy link
Contributor Author

abayer commented Aug 2, 2019

At first glance, it looks like we could do this by adding a flag to the entrypoint binary to ignore the presence of a [wait file].err file and run the command anyway. Also seems like this kind of logic is how we'd want to do a more general conditional execution of steps within a Task? (cc @dibyom)

@dibyom
Copy link
Member

dibyom commented Aug 2, 2019

This seems like a finally type clause for a Task. How do we add this to the API? Should something like a continueOnFailure be part of the Task? (Or for something like a pull request resource, does it make sense to add it as part of the resource spec or maybe even the resource interface?)

@abayer
Copy link
Contributor Author

abayer commented Aug 2, 2019

I think for these particular purposes it'd be step-specific, so it would probably make sense to add it to Step in some form. Then you could specify it in either a resource spec or the resource type, or just specify it on an arbitrary Step as well.

@afrittoli
Copy link
Member

Currently, the step(s) added by GetUploadContainerSpec() won't run if an earlier step fails, since the pod won't run any steps after a failure. This makes things like reporting a build failure via a PullRequestResource or uploading the workspace to a storage resource even if the build fails impossible.

Right now we have an implicit assumption that a failed step will stop the rest of the steps in a Task, and that a failed Task will stop the execution of the pipeline (except for tasks that are already running). I think the assumption is correct for sequential items - like steps - but we should have some finally type of clause, like proposed in the design for conditionals.
For Tasks it's a bit different, the part of the DAG that depends on the failed Task should not be executed - but parts that are independent might still be executed - so this could be a configuration option in the Pipeline.

This should be configurable, ideally both as something that can be inherent to the resource type (i.e., you're always going to want to run the PullRequestResource upload) and something that can be configured on an individual resource (i.e., you're not always going to want to upload to a storage resource after a failure, but sometimes you might!).

The way I solved this in the past was by using async Tasks - i.e. use a cloud event (which is sent regardless of the exit condition of the Task) to trigger an external TaskRun that is responsible to update GitHub. While this increases complexity, it allows decoupling the notification behaviour as well as the status from the initial pipeline. For instance I can add extra postprocessing (like PR update) without changing the original pipeline and also the status of the original pipeline won't be affected by the outcome of the postprocessing itself.

@bobcatfish
Copy link
Collaborator

My personal feeling is that Notifications (#49) could be a nice answer here. Notifications could use PipelineResources and their job would be to:

  1. execute after Tasks have completed
  2. have access to metadata about the execution of previous Tasks / execution of a Pipeline

@dibyom
Copy link
Member

dibyom commented Apr 23, 2020

Closing this in favor of #2448
/triage duplicate
/close

@tekton-robot
Copy link
Collaborator

@dibyom: Closing this issue.

In response to this:

Closing this in favor of #2448
/triage duplicate
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/duplicate Indicates an issue is a duplicate of other open issue.
Projects
None yet
Development

No branches or pull requests

5 participants