-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git source validation will be stuck in some cases which causes a pile up of unregistered build objects #659
Comments
Until now, discuss internally, if However, we found there has a pull request to modify AdvertisedReferences func so that it can use But this PR does't care about I have some discussion with him. He would like a new PR to solve ourselves' issue. However, now his PR is still not merged. Becasue we have a duplicate with him, to some extent, we need to wait his PR is merged so that we can do based on it. But, to be honest, it's the priority 1 issue internally. So not sure if we can rewrite Do you agree it? |
I agree this is something we need to address. Ideally it would be possible to disable git validation in the controller entirely, using feature flags in a ConfigMap, so users can disable this behavior themselves if it's causing problems (e.g., during an GitHub outage). Either way, we should enforce a timeout. As a stopgap I don't think we even need to wait for go-git to support it: func validateWithTimeout(url string) error {
ch := make(chan error)
d := time.Second
go func() {
select {
case ch <- validateWithoutTimeout(url):
case <-time.After(d):
ch <- fmt.Errorf("timeout after %v", d)
}
}()
return <-ch
} (this is just pseudocode to demonstrate the idea, play with this in the Go Playground) But even if we do enforce a timeout, a service outage could make every reconcile take the maximum amount of time, which would be pretty bad. Users should have some way to disable this behavior entirely, and just have affected BuildRuns fail at run-time. |
I agree with Jason, We think about this problem recently and there are three solutions:
The first two solutions seem cannot solve the problem soon, so I also suggest to disable the git validation by default first. and enable it once the problem is solved. It will make us safer to avoid the controller outage. But also would like hear other comments or discuss next Monday together. |
I agree with Jason, We think about this problem recently and there are three solutions:
The first two solutions seem cannot solve the problem soon, so I also suggest to disable the git validation by default first. and enable it once the problem is solved. It will make us safer to avoid the controller outage. But also would like hear other comments or discuss next Monday together. |
Note that
is by design like that. Unless users specify this in their Builds. |
Oh, I wasn't aware of that setting, thanks for pointing that out. Found it here:
This means that validation is on by default, and can be disabled on a per-build basis. I think it could be useful to have an installation-wide option for disabling this in a ConfigMap, which would allow Shipwright operators to disable validation even if it's requested. This could be either temporarily ("GitHub is down, let's disable all validations until it comes back") or permanently ("we don't ever want to validate"). Personally as an operator I'd like to disable controller-side validation entirely, since it can cause reconciliation to take a long time, and slow down processing other unrelated Builds. |
yes, my bad, this is on by default! @xiujuan95 corrected me today. @imjasonh i think you are looking for #651, is this the same? |
Yeah, a ConfigMap to configure the controller sounds best to me. |
I think we can discuss and plan the #651 in the next milestone, because we have more and more configurations, like: https:/shipwright-io/build/blob/master/docs/configuration.md We can decide if we want to keep using environment properties or switch to configmap way. |
Go through above your discussions, my understanding is most of us is agree with disable git validation by default. Now we enable it by default. Yep, maybe we can configure it via a configmap, but as @zhangtbj said, we have more and more configurations now, so maybe it needs more time to discuss. However, for our initernal situation, this issue is urgent. And although @SaschaSchwarze0 makes I also think we need to solve the root cause rather than workaround it. However, about So I prefer to disable git validation by default in our code. Namely, this part should be:
At the same time, we can monitor the progress of community side. Once this PR is merged, we can PR what we want so that we can solve this issue from the root. Do you agree? |
I agree on this. I think we have other issues for providing more flexibility on how to globally disable a validation, e.g. configmaps. I´m wondering if we should have a separate issue for a timeout for the Build reconcile space. At the moment, during a Build reconciliation, all validations are in-cluster, the only one to the outside is the git one. If we have the git validation disabled by default and also a configmap that allow us to disable it globally, Im wondering if the timeout is something we still want? |
In general I would say yes. Any network communication should imo be done with connect and read timeouts in place. If we decide to put a timeout around this (= for the overall Build reconciliation), I would be okay if it is guaranteed that - if the timeout happens - it is ensured that the hanging network connection gets closed. |
An additional thing on timeouts, all in-cluster validations in the Build reconcile do a client call with a context, this context already have a timeout, see https:/shipwright-io/build/blob/master/pkg/reconciler/build/build.go#L50 . The only call we have without a timeout is the one that goes out to the internet (git). |
fwiw, when the git validation first dropped, I believe I was the one who originally insisted on it minimally having an on/off switch. At the time, I chose to be "un-opinionated" on what the default should be. But perhaps I should have been more opinionated :-) (my personal preference would have been off by default), and I am certainly +1 on switching the default to off now. |
@gabemontero this is true, that was certainly a great feedback. |
I submit a PR:#672 to disable sourceURL validation by default, pls take a look, thanks! |
@xiujuan95 can we close this issue? |
use case
During reconcile, our build controller will validate remote git source. And we are using
git ls-remote
to check if remote URL exists or not. About this List, it doesn't have atimeout
parameter can be set. We can see there has anewUploadPackSession
func is initializing a new session. And the new session is generated by aDefaultClient
. The default client is nil. That means the timeout in this client is zero, also means no timeout. ThenAdvertisedReferences
func will do http request besed on this new session. Because the client doesn't have timeout, this causes in some cses, such asgithub or gitlab outage
, http request will be stuck.Above situation will cause our build controller waits all the time. With the increasing number of builds, there will have many builds queue and wait for being reconciled. If remote URL outage all the time, then this will causes new created builds will not be registered. This situation had happened internally.
Reproduce issue
Create multiple builds with this source which is provided by @HeavyWombat in paraller. Then check their status. Then create a new build. The new build should not be registered.
Expected behavior
Git validation should have a timeout. Within this time, if sourceURL checks doesn't finish, then should timeout and build controller should return some meaningful message, such as
checkSourceURLTimeout
and update build status so that controller can handle other build objects.cc/ @zhangtbj
The text was updated successfully, but these errors were encountered: