Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux controllers should be reinstalled when running terraform apply with newly created k8s cluster #500

Closed
networkhermit opened this issue Jun 15, 2023 · 8 comments · Fixed by #661
Labels
bug Something isn't working

Comments

@networkhermit
Copy link

networkhermit commented Jun 15, 2023

Provider version: v1.0.0-rc.5

Environment: k3s

Steps to produce:

  1. use terraform helm and flux providers to bootstrap the cluster
  2. reinstall the k3s cluster
  3. running terraform apply again

Expected: both the helm_release and flux_bootstrap_git resources got recreated

Unexpected: only helm_release got recreated, flux_bootstrap_git doesn't introduce any terraform plan change

I also tried using terraform taint flux_bootstrap_git.default, but it got the following error.

╷
│ Error: Unable to remove namespace
│
│ namespaces "flux-system" not found

Running terraform state rm flux_bootstrap_git.default before terraform apply works though.

@networkhermit
Copy link
Author

networkhermit commented Jun 15, 2023

Related to #479

@swade1987
Copy link
Member

Hello @networkhermit ,

I hope you're doing well! I'm the newest contributor to this repository, and I'm currently in the process of issue grooming to ensure that all concerns are addressed promptly and efficiently.

I noticed this issue you reported and wanted to check in with you to see if it's still affecting your work. Your feedback is invaluable to us, and any additional insights or updates you can share would be greatly appreciated to help us understand and solve the problem more effectively.

If this issue has been resolved, could you please share how it was fixed? This information could be incredibly helpful to others in the community facing similar problems. It would also allow us to close this issue with a clear resolution.
In case the issue is still open and troubling you, let's work together to find a solution. Your satisfaction and the smooth functioning of our project are our top priorities.

Thank you for your time and contributions to our community. Looking forward to your response!

Best regards,

Steve


Note: Any resources managed by Terraform should remain managed by Terraform. If you want to re-bootstrap a cluster, you need to run terraform destroy and then terraform apply. The terraform destroy command will remove the manifests from git and Terraform's state.

@networkhermit
Copy link
Author

The workaround I had settled with was running terraform state rm flux_bootstrap_git.default before terraform apply.

I don't think running terraform destroy and then terraform apply is a better idea. Users might have patched the flux-system components like Using HTTP/S proxy.

Actually I am a little surprised by the 12 thumbs up in the issue description. They might deem this behavior as pitfall and unexpected just like I do.

One thing I think k8s/terraform/flux have in common is the "reconcile loop", so ideally users/controllers can just run terraform apply to reconcile the flux controllers, without knowing if the cluster is in a clean state for flux to bootstrap.

Personally I don't know any other terraform resource has this "reconcile one time only" behavior. And don't forget that if the whole flux-system k8s namespace was removed, even running terraform destroy is in error. See issue 479 for more corner cases.

It would be ironic if we can use flux to reconcile clusters but fail to reconcile flux_bootstrap_git.

@swade1987
Copy link
Member

@networkhermit not sure if you saw #650 but would this help you?

@networkhermit
Copy link
Author

@networkhermit not sure if you saw #650 but would this help you?

No. I don't need more workaround for this issue. I already comment the workaround I prefer to choose in the upmost description.

Have you tried to flux_bootstrap_git a demo cluster and then delete the whole flux-system namespace? As per #479 (comment) said, the terraform delete would fail in this situation.

Anyway, thanks for your contribution. I'm not demanding for a perfect solution but I think this area of behavior is open to discussion. Even if this issue is maked resolved new users would still face confusion.

I have not used other flux bootstrap method, do they all have a no-op behavior when rerun the bootstrap? I think if there are difficulties aroud implementing a actual reconcilable flux_bootstrap_git, it's worth document more details about the limitation.

@swade1987
Copy link
Member

You need to manage the lifecycle of bootstrapping flux using terraform and it needs to keep the state in sync, so I remove flux using terraform destroy

@networkhermit
Copy link
Author

You need to manage the lifecycle of bootstrapping flux using terraform and it needs to keep the state in sync, so I remove flux using terraform destroy

That's just workaround. When users need to retreat to manual intervention to get things fixed, it is the limitation of flux_bootstrap_git, not terraform resource in general. Take terraform helm provider for example, if other parties accidentally or deliberately remove some helm release, running terraform apply (might be in automation ci env) will detect the missing release and got it reinstalled. Users simply don't need to manually inform terraform the specific helm release is missing.

@stefanprodan
Copy link
Member

stefanprodan commented Apr 11, 2024

Running flux bootstrap with the CLI on an empty cluster will deploy Flux and update the deploy key in Git with the newly generated SSH key. If TF doesn't do it, then it's a major bug IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants