Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope watch on "pods" to only pods associated with the local node #716

Merged
merged 1 commit into from
Nov 13, 2019

Conversation

jacksontj
Copy link
Contributor

Prior to this patch the aws-cni plugin would create a watch for all
pods in the cluster to then client-side filter them to only the ones
assigned to the node. This is not only wasteful of resources (as we're
throwing away the majority of data once we get it) this causes some
significant performance/capacity issues on sizeable clusters. In some
cases we've seen the initial watch call return ~500MB! This is a
significant savings of space and resources (both client and server side)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Prior to this patch the aws-cni plugin would create a watch for *all*
pods in the cluster to then client-side filter them to only the ones
assigned to the node. This is not only wasteful of resources (as we're
throwing away the majority of data once we get it) this causes some
significant performance/capacity issues on sizeable clusters. In some
cases we've seen the initial `watch` call return ~500MB! This is a
significant savings of space and resources (both client and server side)
@mogren mogren self-requested a review November 12, 2019 22:56
Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually having trouble understanding why this was ever like this... :/

Big +1 from me.

@jacksontj
Copy link
Contributor Author

After reading through it again, the only place that uses this is cni-metrics-helper which uses the list of all CNIs to aggregate the metrics. If that is still necessary it seems that it should have its own separate watch for all pods in the same daemonset (instead of requiring watching all pods in the entire cluster).

@mogren
Copy link
Contributor

mogren commented Nov 12, 2019

@jacksontj You are absolutely correct, thanks a lot for this PR.

I was just about to comment saying that we should definitely do this change for workerPods in the CNI, and then handle cniPods differently in the cni-metrics-helper. There I guess we can just filter on the DaemonSet/aws-node (or ns kube-system?) instead. @jaypipes?

@jacksontj
Copy link
Contributor Author

jacksontj commented Nov 12, 2019

Glad to be of help :)

IMO you should be watching for a little as necessary. So for the cni-metrics-helper I think you want to filter for both kube-system and DaemonSet/aws-node to avoid pulling more data than necessary.

@jacksontj
Copy link
Contributor Author

@mogren are you guys okay to merge this as-is and cleanup the cni-metrics-helper in another PR? Or do we need to address that in this PR? If it needs to be addressed here, any suggestions? (I don't use that helper, so not sure how to test it easily).

@mogren
Copy link
Contributor

mogren commented Nov 13, 2019

@jacksontj I'm fine with merging this first, then fixing the metrics helper. The current v1.5.5 build will work with the v1.6.x CNI as well, since we have not changed or added any prometheus metrics. I'll add a follow up issue to fix it.

@mogren mogren merged commit d73d118 into aws:master Nov 13, 2019
@jacksontj
Copy link
Contributor Author

jacksontj commented Nov 13, 2019

Sounds good to me, thanks for the quick responses!

@mogren any plans to kick this out in a release soon? Checking in as this is a pretty large liability to close up :)

@jacksontj jacksontj deleted the discover_pods branch November 13, 2019 18:23
@mogren
Copy link
Contributor

mogren commented Nov 13, 2019

@jacksontj New releases take quite a while to verify. If you need this urgently, I would check out the release-1.5.5 branch, apply this fix, then build your own image.

To build:

git co release-1.5.5
<apply patch>

git tag v1.5.5-watcher-fix
export VERSION=v1.5.5-watcher-fix

make docker

docker tag amazon/amazon-k8s-cni:v1.5.5-watcher-fix <your repository>:v1.5.5-watcher-fix
docker push <your repository>:v1.5.5-watcher-fix

kubectl patch daemonset aws-node -n kube-system \
-p '{"spec": {"template": {"spec": {"containers": [{"image": " <your repository>:v1.5.5-watcher-fix","name":"aws-node"}]}}}}'

@jacksontj
Copy link
Contributor Author

jacksontj commented Nov 13, 2019

I have already created a build :) I was just checking in on release cadence there as I don't like running a fork, but I'll just keep an eye out for a new release.

For anyone else that needs this and can't wait I have a forked build of 1.5.3 on quay: quay.io/jacksontj/aws-cni:v1.5.3-2-g70bf43d7

@mogren mogren mentioned this pull request Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants