Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error notifications despite the resource being successfully reconciled #258

Closed
Diaoul opened this issue May 5, 2021 · 5 comments · Fixed by #660
Closed

Error notifications despite the resource being successfully reconciled #258

Diaoul opened this issue May 5, 2021 · 5 comments · Fixed by #660

Comments

@Diaoul
Copy link

Diaoul commented May 5, 2021

This issue was first opened at fluxcd/flux#3480

Describe the bug

Flux sends out error level notifications despite the resource being successfully reconciled.
This is the discord notification I received, in that order:

[info] helmrelease/jellyfin.media
Helm upgrade has started
revision
7.3.2
[info] helmrelease/jellyfin.media
Helm upgrade succeeded
revision
7.3.2
[error] helmrelease/jellyfin.media
reconciliation failed: Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "jellyfin": the object has been modified; please apply your changes to the latest version and try again
revision
7.3.2

And when I checked later:

$ flux get helmrelease -n media jellyfin
NAME    	READY	MESSAGE                         	REVISION	SUSPENDED
jellyfin	True 	Release reconciliation succeeded	7.3.2   	False 

All of this happened in a 2 minutes time window between the start of the reconciliation and the error notification.

To Reproduce

Hard to tell. No manual intervention was made besides updating the docker image in the values of the chart on the gitops repository, all those resources are managed by flux. Last time jellyfin was reconciled it worked fine. A week ago grafana reconciliation had the same error but not after so it does not seem to be related to a helm chart in particular.
My guess is that there is a conflict because flux tries to run two reconciliations at the same time of the same resource.

Expected behavior

Error notifications sent when reconciliation actually fails, maybe for a longer period of time? At least make this maybe a warning level on first occurrence.
I am not sure what should be done, but throwing an error seems wrong.

@stefanprodan stefanprodan transferred this issue from fluxcd/notification-controller May 6, 2021
@stefanprodan
Copy link
Member

Please post here the output of flux check

@Diaoul
Copy link
Author

Diaoul commented May 6, 2021

Sure

► checking prerequisites
✔ kubectl 1.21.0 >=1.18.0-0
✔ Kubernetes 1.20.6+k3s1 >=1.16.0-0
► checking controllers
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.12.0
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.10.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.13.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.12.1
✔ all checks passed

@mikalai-t
Copy link

Any update on this? The same happens here. In the chronological order:
image

prerequisites:

  • Developer pushes a piece of code
  • CI system test/builds and pushes new tagged image to the ECR
  • According to the configured ImageRepository and ImagePolicy a new tag is detected
  1. Image Update Automation controller commit a new tag to the Git repo
  2. Source Controller updates repo ..somewhere inside Flux Pod
  3. 3.2. and 3.1. came to the same moment, thus probably Kustomization controller was the first who updated a tag value in HelmRelease and sent the event to the Slack channel, then Helm Controller started upgrade process
  4. A new image was successfully deployed.
  5. Failed ...why? what went wrong?
    kubectl get -o yaml helmrelease ...
    image

I'd appreciate any help or idea how to debug that issue

@tbondarchuk
Copy link

I suspect it might be caused by two reconciliation happening at the same time: one set by HelmRelease's interval, second triggered by source update. I've seen the same issue but only from time to time, usually for me it's bunch of releases updated properly and one or two: first success then "object has been modified" error. Though I haven't seen such errors from Kustiomization, so maybe helm-controller treats already running reconciliation somewhat differently then kustomize-controller?

Something similar is described in fluxcd/flux2#1882 could they be connected?

@jprecuch
Copy link

It seems to be still happening. At least It happened in 1/2 of our clusters. Upgrade went through fine as described in this issue

helmrelease/sde.sde
Helm upgrade has started
revision
2022.2.0-external2

helmrelease/sde.sde
Helm upgrade succeeded
revision
2022.2.0-external2

helmrelease/sdesde
reconciliation failed: Operation cannot be fulfilled on [helmreleases.helm.toolkit.fluxcd.io](http://helmreleases.helm.toolkit.fluxcd.io/) "sde": the object has been modified; please apply your changes to the latest version and try again
revision
2022.2.0-external2

helmrelease/sde.sde
reconciliation failed: Operation cannot be fulfilled on [helmreleases.helm.toolkit.fluxcd.io](http://helmreleases.helm.toolkit.fluxcd.io/) "sde": the object has been modified; please apply your changes to the latest version and try again
revision
2022.2.0-external2

kustomization/helmcharts.flux-system
Health check passed in 20.121041632s
revision
main/7701d7768535a34ca4b53df88d822f65beecb4ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants