-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use DynamicCRDController with Kong controllers #4619
Conversation
335b011
to
7d930ff
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #4619 +/- ##
=======================================
+ Coverage 68.0% 68.1% +0.1%
=======================================
Files 163 163
Lines 19091 19067 -24
=======================================
+ Hits 12992 12996 +4
+ Misses 5331 5298 -33
- Partials 768 773 +5
☔ View full report in Codecov by Sentry. |
7d930ff
to
a4e50d3
Compare
This will still let routes go missing during the problem period. Disabling the controllers doesn't disable the accompanying section of the parser, so in DB mode the instance will come up, disable some resource controller, see 0 such resources listed when the parser runs, and delete any previously-added configuration associated with it. DB-less doesn't have any way to persist configuration across container starts, but will result in inconsistent configuration across the cluster: newer replicas will enter service without the associated configuration until the problem condition fixes itself. Legacy DB-less is the only variant that should start without API server access, which is what's happening in the Istio case. For a somewhat contrived demonstration, you can follow the API server kill instructions with a DB-less legacy instance running, use I haven't tried to replicate the situation with the API server running but unreachable (which could presumably allow Pods to enter into service), but there's enough general weirdness and problem behavior from the imperfect parser+store/controller relationship that I think we need to just fail early. |
I have not tested every possible scenario for this but my gut tells me we should instead of not doing this, pay close attention to errors returned from the API server when e.g. looking up CRDs. E.g. https:/Kong/gateway-operator/pull/1059 tries to do just that. An explicit missing CRD (reported via a status code 404 with A network error is another (tested with blocking traffic through We could combine that with a retry backoff mechanism which would I realized I can do this in
then you can remove it via
This works when the api server runs on default - 6443 - port. |
d0ade1f
to
7d45ef9
Compare
To simulate the network issues in code I have added an envtest After thinking more about it based on Travis' points, I think that the most robust way to prevent issues like #4603 and #4207 indeed would be to verify Kubernetes API connectivity on startup, before running any components (controllers, parser, etc.). Having that +
Yeah, if we agree that all KIC CRDs are required then we could not use |
95115bb
to
0122c0e
Compare
0122c0e
to
2c2fdf6
Compare
2c2fdf6
to
21f069b
Compare
Closing this in favor of #4641. |
What this PR does / why we need it:
Use
DynamicCRDController
instead of a singleShouldEnableCRDController
call on startup to decide whether Kong CRDs' controllers should be run. This should fix the issue of these controllers not being started in a scenario where there's a temporary API server connection problem (e.g. because of waiting for Istio sidecar to startup).Which issue this PR fixes:
Should fix #4618.
Special notes for your reviewer:
PR Readiness Checklist:
Complete these before marking the PR as
ready to review
:CHANGELOG.md
release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR