-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BuildRun Failure State Transitions #558
Comments
Keeping this one open for a while. The general idea so far is to follow a
where a BuildRun cannot change its state after the CompletionTime is set. The above is coming from #551 |
I would like to broaden the scope of this issue a little bit: In the BuildRun controller we have the problematic
While for user-originated errors, I propose to make the BuildRun a one-time shot = we set the BuildRun error details, we also set its completionTime and then the controller MUST NOT return an error so that no further retries happen. What we should consider in this context is to prohibit end user's from updating a BuildRun. We want them to create a new one, as such I do not see any use case to allow them to do an update operation at all one a BuildRun. EDIT: once we want to do something like cancel, we will likely still need to allow updating a BuildRun. For system-originated errors where we do intend to do a retry, we imo either should stop to update the BuildRun status at all (and only return the error to trigger the retry) or we set a temporary failure without a completionTime that we remove once the retry succeeded. -- Also a little bit related to my comment in Fix missing BuildRun Conditions updates on errors #548. Fyi @xiujuan95 |
FYI whatever traction that might exist for retry in Tekton is centered around this TEP: tektoncd/community#239 Evidently there is some retry support at the PipelineTask level that I at least was previously unaware of. A quote from that TEP: And found this: https:/tektoncd/pipeline/blob/master/pkg/apis/pipeline/v1beta1/pipeline_types.go#L137 Granted it seems orthogonal since shipwright does not leverage PIpelineTask. But another "monitor upstream Tekton in this space" element for this one perhaps. |
I agree with the user vs. system error distinction and that we should allow the controller's behavior to vary based on that @SaschaSchwarze0 |
Agree - I envision cancel being implemented by setting the
My preference is to just not update the status and retry the reconciliation. The caveat being that if we reach a reconciliation retry limit, then we should update the status with an error message. |
Good comments @adambkaplan and @gabemontero. Had a call with @qu1queee just an hour ago on this as well. I think we are totally on the same page here. Just one question on the following @adambkaplan
More than half a year ago we also discussed a scenario in a community meeting or issue (I think it was related to the build-secret relationship at that time which has been handled with a different implementation in the meantime). How would one implement a way to detect that for the same reconciliation request, one is in a retry loop that is already ongoing for n1 retries or n2 minutes? |
Hrm, it seems there is no direct way to say "stop reconciling after X failed attempts". However, in reviewing other issues reported on controller-runtime it seems that if the reconciler returns |
Okay, we are on the same page then. Thanks for clarifying. |
I will provide a PR to address this issue and also to address this enhancement #548 (comment). I´m assigning this issue to me in the meantime. |
I think this should not yet have been closed. |
FYI, I´m already working on this one. |
This was tackle via #641 |
Idea:
This is coming from an internal bug we found in our continuous tests. At the moment a BuildRun can be mark as completed during specific scenarios, but it can happen that at some point later, the BuildRun controller reconciles again and the pod runs to completion.
In the above scenario a BuildRun Status will be in a failed state, while the container image was successfully build.
In order to address the above, we have been discussing on how we should treat a BuildRun in general. Our idea is that a BuildRun is:
which makes sense because we are aiming to run something till Completion. If the above is the case, we will need to ensure this expectation matches the code implementation, in order to avoid the above scenario.
The text was updated successfully, but these errors were encountered: