Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement drift detection and correction for cluster state #661

Merged
merged 4 commits into from
Apr 16, 2024

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented Apr 14, 2024

This PR implements Flux readiness checks and drift detection for the cluster state. The provider applies changes to the Flux components and GitRepository/Kustomization manifests on the cluster, thus enabling changes to Git URL and branch to be actuated. It also adds a check for verifying the kubeconfig during planning phase.

Description

Changes:

  • Detect is Flux is running in the cluster and if it is ready during planning phase.
  • Restore Flux on a cluster if readiness checks fails.
  • Apply changes in configuration directly on the cluster on updates.
  • Disallow changes to the bootstrap path field in the same way the CLI does it (breaking change).

Motivation and Context

Being able to update Flux in the cluster by detecting drift in the cluster state during planning and apply.

Fix: #500
Fix: #656
Fix: #653
Fix: #564
Fix: #176
Fix: #499

How has this been tested?

  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch?

Manual testing for:

  • change Git branch
  • change Git URL
  • cluster will Flux controllers deleted
  • cluster with Flux GitRep object delete
  • recreated cluster without Flux outside of TF

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Documentation

  • I have updated the documentation (if required) with make docs

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have signed-off my commits with git commit -s

@stefanprodan
Copy link
Member Author

@swade1987 it would be great if you could test this PR on your clusters, especially the kubeconfig validation during planning.

@stefanprodan stefanprodan force-pushed the cluster-drift-correction branch 8 times, most recently from 4238103 to 7f01784 Compare April 14, 2024 10:13
@swade1987
Copy link
Member

@stefanprodan I'll take this for a spin next week and keep you posted.

@stefanprodan stefanprodan force-pushed the cluster-drift-correction branch 4 times, most recently from 497c47a to 736db02 Compare April 14, 2024 14:35
@stefanprodan stefanprodan changed the title Detect and correct drift in cluster state Apply config changes in cluster Apr 14, 2024
Signed-off-by: Stefan Prodan <[email protected]>
@stefanprodan stefanprodan force-pushed the cluster-drift-correction branch 7 times, most recently from be76123 to bb2a674 Compare April 14, 2024 23:08
@stefanprodan stefanprodan changed the title Apply config changes in cluster Implement drift detection and correction for cluster state Apr 14, 2024
@stefanprodan stefanprodan added enhancement New feature or request area/kubernetes Kubernetes bootstrap related issues and pull requests labels Apr 14, 2024
@stefanprodan stefanprodan force-pushed the cluster-drift-correction branch 2 times, most recently from 1761d4a to b88263d Compare April 15, 2024 11:55
@swade1987
Copy link
Member

@swade1987 it would be great if you could test this PR on your clusters, especially the kubeconfig validation during planning.

Looking good @stefanprodan

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: Get "https://xxxx/api/v1/namespaces/flux-system": dial tcp: lookup XXXX: no such host
│
│   with kubernetes_namespace.flux_system,
│   on main.tf line 67, in resource "kubernetes_namespace" "flux_system":
│   67: resource "kubernetes_namespace" "flux_system" {
│
╵

@swade1987
Copy link
Member

Changing the git branch works as well (as long as the branch exists in GitHub) (see below)

provider "flux" {
  kubernetes = {
    host                   = kind_cluster.this.endpoint
    client_certificate     = kind_cluster.this.client_certificate
    client_key             = kind_cluster.this.client_key
    cluster_ca_certificate = kind_cluster.this.cluster_ca_certificate
  }
  git = {
    url = "ssh://[email protected]/${var.github_org}/${var.github_repository}.git"
    branch = "test-branch"
    ssh = {
      username    = "git"
      private_key = tls_private_key.flux.private_key_pem
    }
  }
}

Ran terraform apply then ...

flux export source git flux-system | grep branch
    branch: test-branch

@swade1987
Copy link
Member

LGTM @stefanprodan, I ran a number of tests locally and things look good.

Signed-off-by: Stefan Prodan <[email protected]>
Signed-off-by: Stefan Prodan <[email protected]>
@stefanprodan stefanprodan merged commit 24e21c2 into main Apr 16, 2024
11 checks passed
@stefanprodan stefanprodan deleted the cluster-drift-correction branch April 16, 2024 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment