Race condition between secret reconciler and object reference index #5175

backjo · 2023-11-16T06:17:22Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

We recently noticed similar errors to #4672 where our secrets were getting "Not Found" errors despite very much existing and being referenced by KongConsumer via credentials. After checking our CRDs as mentioned in that issue, we added some custom logging to understand why our secrets were not getting populated into the cache. The custom logging revealed that the controller for Secrets was evaluating reference checks before the controller for KongConsumer had reconciled the consumers and populated the references, resulting in 0 secrets being reconciled.

Expected Behavior

All initial resources should be loaded before calculating references.

Steps To Reproduce

No response

Kong Ingress Controller version

3.0.0, but appears to be happening in 2.11.* as well

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.9", GitCommit:"a1a87a0a2bcd605820920c6b0e618a8ab7d117d4", GitTreeState:"clean", BuildDate:"2023-04-12T12:16:51Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.10-eks-4f4795d", GitCommit:"164dfb62db432c0b28a1fced3956256af68533b6", GitTreeState:"clean", BuildDate:"2023-10-20T23:21:27Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Anything else?

Similer to #4672

The text was updated successfully, but these errors were encountered:

randmonkey · 2023-11-17T03:24:33Z

I checked the code, When the KongConsumer gets reconciled, the controller will retrieve the referred secrets in its credentials and fetch the secrets in cluster then add them to cache.
This satisfies the eventual consistency, where the KongConsumer and Secret used in its credentials are all added to the cache after KongConsumer gets reconciled.
What other unexpected behaviors did you found other than the "Not Found" logs?

backjo · 2023-11-18T06:39:29Z

Hey @randmonkey - the main unexpected behavior we observe is consistent with the credentials being loaded - namely that our ACL plugin is blocking requests for ~5-10 minutes after startup due to the credential presented not being found.

backjo · 2023-11-18T07:32:51Z

Going to close this until I can provide more debug info

backjo · 2023-11-27T14:30:21Z

hi @randmonkey - did some further digging here and the behavior I generally can reproduce is:

Controller starts up and reconciles KongConsumer objects. Secrets have not yet been loaded, so it requeues the reconcile operation.
Controller writes initial configuration to the Kong Admin API - without information from Secret objects. For us, this means that it doesn't write the relevant "groups" information from credentials for the ACL plugin to function.
KongConsumer objects are reconciled on the requeue, which loads the credentials successfully.
Controller writes a second configuration to the Kong Admin API with updated groups.

The second update to the admin api happens a few seconds later from the logs we put in place, but for the few seconds between the first and second writes, we have an effectively 'broken' configuration.

backjo · 2023-11-27T14:43:09Z

This seems fixable by #2249 - maybe the best intermediate solve here is to allow InitCacheSyncDuration to be configurable instead of just hard coded to 5s

backjo · 2023-11-27T15:45:37Z

Ah - I think there was a regression to the #2249 fix here. In #4101, InitCacheSyncDuration started being passed in when the synchronizer is created, but InitCacheSyncDuration is not being initialized to anything while previously DefaultCacheSyncWaitDuration in synchronizer.go was being initialized to 5 seconds

backjo added the bug Something isn't working label Nov 16, 2023

randmonkey self-assigned this Nov 17, 2023

randmonkey added the pending author feedback label Nov 17, 2023

backjo closed this as completed Nov 18, 2023

backjo reopened this Nov 27, 2023

backjo mentioned this issue Nov 27, 2023

fix(manager): set InitCacheSyncDuration to 5s by default and allow it… #5238

Merged

1 task

rainest closed this as completed in #5238 Nov 30, 2023

piotrwielgolaski-tomtom mentioned this issue Jan 4, 2024

Reconcile all secret with label konghq.com/credential #5398

Closed

1 task

rainest mentioned this issue Apr 9, 2024

Kong ingress controller failed to fetch secrets for kong consumers if restarted (either due to some crash or planned) #5784

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition between secret reconciler and object reference index #5175

Race condition between secret reconciler and object reference index #5175

backjo commented Nov 16, 2023

randmonkey commented Nov 17, 2023 •

edited

Loading

backjo commented Nov 18, 2023

backjo commented Nov 18, 2023

backjo commented Nov 27, 2023

backjo commented Nov 27, 2023

backjo commented Nov 27, 2023

Race condition between secret reconciler and object reference index #5175

Race condition between secret reconciler and object reference index #5175

Comments

backjo commented Nov 16, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Kong Ingress Controller version

Kubernetes version

Anything else?

randmonkey commented Nov 17, 2023 • edited Loading

backjo commented Nov 18, 2023

backjo commented Nov 18, 2023

backjo commented Nov 27, 2023

backjo commented Nov 27, 2023

backjo commented Nov 27, 2023

randmonkey commented Nov 17, 2023 •

edited

Loading