Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with Istio sidecar prevents KIC to startup correctly #4603

Closed
1 task done
gallolp opened this issue Sep 4, 2023 · 2 comments · Fixed by #4641
Closed
1 task done

Race condition with Istio sidecar prevents KIC to startup correctly #4603

gallolp opened this issue Sep 4, 2023 · 2 comments · Fixed by #4641
Labels
bug Something isn't working

Comments

@gallolp
Copy link

gallolp commented Sep 4, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

This is similar to what is observed in #4207 .

Due to the new controller startup logic described here and in this PR, if the network is not available when the ingress-controller container starts and it can't connect to the k8s control plane then all the controllers are disabled.

When the ingress-controller container starts and attempts to get k8s resources before the envoy sidecar is ready, it fails and the no routes are added to the Kong proxy instances.

Expected Behavior

The controller should retry or fail (restart) when the control plane is unavailable at boot.

Steps To Reproduce

- Deploy Kong with Istio sidecar. For example deploy using the Helm chart annotating the namespace with istio injection.
- Wait for the race condition to happen

Kong Ingress Controller version

Tested positive on 2.10.x.
Unable to reproduce with 2.7.x.
Should happen on 2.8.x and up.

Kubernetes version

Tested on 1.23 and 1.26.

Anything else?

Sample logs. Check timestamps.

Istio sidecard (extract):

2023-09-04T17:49:40.380745Z    info    Envoy proxy is ready

Ingress controller container logs (extract):

{"level":"info","logger":"controllers.crdCondition","msg":"Disabling controller for Group=configuration.konghq.com/v1beta1, Resource=udpingresses due to missing CRD","time":"2023-09-04T17:49:37Z"}
{"level":"info","logger":"controllers.crdCondition","msg":"Disabling controller for Group=configuration.konghq.com/v1beta1, Resource=tcpingresses due to missing CRD","time":"2023-09-04T17:49:37Z"}
{"level":"info","logger":"controllers.crdCondition","msg":"Disabling controller for Group=configuration.konghq.com/v1, Resource=kongingresses due to missing CRD","time":"2023-09-04T17:49:37Z"}

This issue seems to be tracked here in Istio. One of the proposed solutions here is the use of postStart lifecycle hooks.

Maybe the KIC can implement either:

  • a retry logic when the control plane is unavailable
  • a fail logic (distinguish CRD not present from API call failed)

Or if that is not possible maybe the helm chart can add support for lifecycle hooks for the ingress-controller container like it does for the proxy container.

@gallolp gallolp added the bug Something isn't working label Sep 4, 2023
@pmalek
Copy link
Member

pmalek commented Sep 6, 2023

#4618 might be a better solution than trying to implement it in the chart.

@gallolp
Copy link
Author

gallolp commented Sep 6, 2023

Having the retry logic in the controller would be ideal. It would help in this Istio case and any other case of network outage/delay at pod start.

The container lifecycle hook is just a workaround and it has proven to be ineffective in some cases.

Thank you for looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants