Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make "Cross Namespace Reference" error more difficult to trigger or easier to diagnose #201

Closed
kingdonb opened this issue Jul 16, 2021 · 1 comment · Fixed by #299
Closed

Comments

@kingdonb
Copy link
Member

kingdonb commented Jul 16, 2021

People seem to be wanting to put the automation resources in the same namespace as the Kustomization deploys into. Unfortunately this cannot work, since ImageUpdateAutomation defines SourceRef as a kind and name only with no namespace field.

This is true for reasons that are well understood, (secrets cannot be read across namespace boundaries, and this will not be possible to change at least in the current incarnation of Flux's model for security that relies on Service Accounts, which are confined to within a namespace.)

But this information still seems to be lacking in support from the docs. The API docs themselves should clarify this minimally in the documentation of SourceRef, which I think is addressed in #200.

It still seems likely that by the time someone finds this note in the API docs, they will have already struggled for some time. I think someone needs to look at this issue holistically and figure out if there is more that we could be doing at the point of failure to help the user figure out what they've done wrong. (This will be tricky in some cases especially where there is no actual reconciler actively invoked by the certain failure modes. For example, kyaml tag referring to a policy that does not exist... no automation scanner will currently find this error, and thus no events will be emitted.)

This error is especially common for Helm Controller users, since the HelmRelease resources also define a field SourceRef, but with a different CrossNamespaceObjectReference type which actually can successfully reach across namespaces. This one doesn't have a need for reading directly from any secrets, so it works. I think this difference is very surprising and still seems quite hard to discover.

We seem to get this question from a different user in #flux Slack at least twice a week, and most but not all of them seem to be using Helm Controller. There are a few different ways this failure happens, and I think some of them can be stopped by emitting an error, while some other modes of failing here might already emit errors:

  • the ImageUpdateAutomation.spec.sourceRef refers to a resource in a different namespace, with a namespace field
  • the sourceRef refers to a resource which is in a different namespace, without including a reference to the namespace
  • the kyaml setter tag references an imagepolicy by name and namespace, but it is not referenced in the correct namespace

The first error should be stopped by existing validation, although I'm not 100% sure about that. There is no namespace field, and if you provide one there, Kubernetes should emit a validation error preventing the resource from being created.

The second seems possible to catch, by emitting an error event from ImageUpdateAutomation signaling that the GitRepository referenced by sourceRef could not be found. I think this is something we don't currently emit errors about.

The third mode is also very common and it is quite tricky to diagnose, especially if you don't yet know this fact about our API, (for most users there are effectively no errors emitted to indicate what has gone wrong, everything just silently fails.) Many users think the policy (and automation) should be in the same namespace as the deployment target namespace, rather than wherever the GitRepository resource is located.

Under some conditions, it seems possible the definition for SourceRef would permit a namespace field to be populated and silently dropped. I don't have a fresh report handy, don't read too much into this, (if it isn't possible that's my mistake.)

This general mode of failure around image automation remains an extremely common support request as of Flux 0.16.1.

@kingdonb
Copy link
Member Author

Some of this is already mentioned in #85 which is effectively about the same issue – this might be a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant