Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle reconciliation in case of error #203

Closed
raffis opened this issue Dec 14, 2020 · 6 comments · Fixed by #205
Closed

Throttle reconciliation in case of error #203

raffis opened this issue Dec 14, 2020 · 6 comments · Fixed by #205

Comments

@raffis
Copy link

raffis commented Dec 14, 2020

If there is an error it looks like the configured interval does not get considered.

I have a 10m configured interval and one error in a manifest and see a reconciliation every ~10s.
Which also leads to a slack alert for each. (Also the slack alerts are useless because of #190, I don't see the actual error in the notification nor the log because of #202).

{"level":"error","ts":"2020-12-14T10:23:53.606Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:23:53.606Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:21.658Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:21.658Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:48.822Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:48.822Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:16.280Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:16.280Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:41.754Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:41.754Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}

@stefanprodan
Copy link
Member

We use the controller-runtime exponential backoff, the retries interval is slowly increased.

@raffis
Copy link
Author

raffis commented Dec 14, 2020

We use the controller-runtime exponential backoff, the retries interval is slowly increased.

In this case I see no point in that, these kind of errors don't go away until the next source sync. This only fixes temporary issues.

@stefanprodan
Copy link
Member

stefanprodan commented Dec 14, 2020

Well a kubectl apply will fail if your master nodes are restarting or if there is a temporary connection errors between pods and Kubernetes API service.

@raffis
Copy link
Author

raffis commented Dec 14, 2020

Well the reconciliation with exponential backoff is not really problem but in combination with slack alerts quite unusable, you end up with way too many alerts. Is there a workaround for the notification controller?

If not maybe the solution is better placed there, I've only seen suspend but it does not get resumed after source changes for what I've seen now. So a check which does not send the same events again in a certain time window may do it.

@stefanprodan
Copy link
Member

I think this can be a feature request for notification-controller, if we make it into a statefulset, then it can store events in a database and prevent spurious events being send.

@stefanprodan
Copy link
Member

stefanprodan commented Dec 14, 2020

Well the reconciliation with exponential backoff is not really problem but in combination with slack alerts quite unusable, you end up with way too many alerts

@raffis with this bug fixed, there will be no more Slack spam on kubectl apply errors, once the reconciliation status can be persisted in etcd, the controller will retry at the configured interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants