Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the amount of stored ReplicaSet data #5580

Closed
wants to merge 2 commits into from

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Sep 20, 2024

What does this PR do?

Use a transform function to drop all ReplicaSet data in the kubernetes provider, except for the owner references, which we need to find the Deployment name.

Unfortunately, the autodiscovery library doesn't let us pass the transform function in, so I had to copy their implementation and make the change myself. The plan is to upstream this change later.

Why is it important?

In clusters with a lot of Deployments, we end up storing a lot of ReplicaSets in the local cache of each agent, resulting in significant unnecessary memory consumption.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

There isn't an easy way. You can start a local K8s cluster, create a lot of Deployments, deploy Agent using the Helm Chart here, and check the memory consumption.

Related issues

@swiatekm swiatekm added the enhancement New feature or request label Sep 20, 2024
Copy link
Contributor

mergify bot commented Sep 20, 2024

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Sep 20, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 20, 2024
@swiatekm swiatekm force-pushed the k8sprovider-replicaset-storage branch from a56dfca to 1e43602 Compare September 20, 2024 14:58
@swiatekm
Copy link
Contributor Author

Note: I'd like to verify that this actually improves the situation in a real cluster. I'm working with our SRE team to do this. Until then, I'm keeping this PR as Draft and not adding unit tests or changelog entries.

transformed.ObjectMeta = kubernetes.ObjectMeta{
Name: old.GetName(),
Namespace: old.GetNamespace(),
OwnerReferences: old.GetOwnerReferences(),
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if you don't include ResourceVersion: old.GetResourceVersion() here then you effectively make the opts.IsUpdated check inside UpdateFunc of the informer to be always false

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Sep 23, 2024

Observations:

  1. I like the introduction of the transform func and I left a comment on it
  2. I am wondering the following; since it seems we want only the Metadata why aren't we using something like a metadata shared informer?! e.g. snippet for replicasets
    metadataClient, err := metadata.NewForConfig(config) // "k8s.io/client-go/metadata"
    if err != nil {
      panic(err.Error())
    }
    
    // Define the resource to watch metadata for
    gvr := schema.GroupVersionResource{
        Group:    "apps",
    Version:  "v1",
    Resource: "replicasets",
    }
    
    resyncPeriod := 10 * time.Second
    
    indexers := cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}
    
    // Create the SharedIndexInformer using the provided code snippet
    informer := cache.NewSharedIndexInformer(
        &cache.ListWatch{
            ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
                return metadataClient.Resource(gvr).List(ctx, options)
            },
            WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
                return metadataClient.Resource(gvr).Watch(ctx, options)
            },
        },
        &metav1.PartialObjectMetadata{},
        resyncPeriod,
        indexers,
    )
    
    after a quick local-testing because PartialObjectMetadata struct is way smaller than ReplicaSet struct used in this snippet the allocations to unmarshall the former (happening before calling transform func) are way less than the ones required for the latter and this is evident in List and Stream funcs of the informer.

Use a transform function to drop all data except for the owner
references, which we need to find the Deployment name.
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
9.8% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@swiatekm
Copy link
Contributor Author

swiatekm commented Oct 1, 2024

Note: This code is going to move to the autodiscovery library, because we need to make a similar change in beats as well. I'm going to keep it here for now, and rebase once that's done. See #5580 for more details.

@swiatekm
Copy link
Contributor Author

swiatekm commented Oct 2, 2024

Closing in favor of elastic/elastic-agent-autodiscover#111. I'm going to recreate this PR after releasing a new version of autodiscover and updating go.mod in agent.

@swiatekm swiatekm closed this Oct 2, 2024
swiatekm added a commit to elastic/elastic-agent-autodiscover that referenced this pull request Oct 3, 2024
…cts (#109)

We only use metadata from Jobs and ReplicaSets, but require that full
resources are supplied. This change relaxes this requirement, allowing
PartialObjectMetadata resources to be used. This allows callers to use
metadata informers and avoid having to receive and deserialize
non-metadata updates from the API Server.

See elastic/elastic-agent#5580 for an example of
how this could be used. I'm planning to add the metadata informer from
that PR to this library as well. Together, these will allow us to
greatly reduce memory used for processing and storing ReplicaSets and
Jobs in beats and elastic-agent.

This is will help elastic/elastic-agent#5580 and
elastic/elastic-agent#4729 specifically, and
elastic/elastic-agent#3801 in general.
swiatekm added a commit to elastic/elastic-agent-autodiscover that referenced this pull request Oct 3, 2024
Introduce metadata-only watchers to the kubernetes package. These are
useful if we only need to track metadata for a resource - a good example
are ReplicaSets, for which we usually only care about the
OwnerReferences. As a result, we only store the metadata, reducing
steady-state memory consumption, but also only get updates involving
metadata, reducing churn greatly in larger clusters.

The implementation introduces new constructors for the Watcher, allowing
an informer to be passed in. Existing constructors are implemented using
the new constructor, though none of the code actually changes. As a
result, it is now possible to unit test the watcher, and I've added some
basic unit tests for it.

We also add two helper functions:

- `GetKubernetesMetadataClient` creates a metadata-only kubernetes
client, and is very similar to the existing `GetKubernetesClient`
- `RemoveUnnecessaryReplicaSetData` is a transform function that can be
passed into an informer so it only stores the metadata we actually use

I tested these new functions in both beats and agent, in a kind cluster
as well as one of our staging clusters.

This is part of the solution to
elastic/elastic-agent#5580.

---------

Co-authored-by: Mauri de Souza Meneguzzo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement New feature or request
Projects
None yet
3 participants