-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Set NF_NAT_RANGE_PROTO_RANDOM_FULLY flag on masquerading rules #1004
Comments
I have the same issue with Kubernetes 1.10.4 + Azure CNI 1.0.6 |
I just posted a little write-up about our journey troubleshooting the issue, and how we are worked around it in production: https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/. Implementing |
We've seen a lot of 5 second delays for DNS lookups and other requests as well. Also, lateley, we've seen a lot of 1 second connects. I tried upgrading Flannel to But how do I confirm that flannel is actually "upgraded". I upgraded the daemon set and made sure pods are updated. But what does this mean with existing pods and networks? Do I need to re-create all the pods, or even do something on host-level? EDIT: Turns out that |
@anton-johansson I tried Flannel-v0.12.0 k8s-1.6.2 iptables-1.6.2 , but no --fully-random in iptables-save result . |
Current Behavior
We are experiencing random 5 second timeouts with DNS, database connections and other things in our Kubernetes cluster.
Possible Solution
Use the iptables --random-fully flag when creating masquerade rules. I have the first step of this pending in go-iptables, coreos/go-iptables#48
Steps to Reproduce (for bugs)
It is reproducible by requesting just about any in-cluster service, and observing that periodically ( in our case, 1 out of 50 or 100 times), we get a 5 second delay. It always happens in DNS lookup.
Context
We believe this is a result of a kernel level SNAT race condition that is described quite well here:
https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02
The problem happens with non-flannel CNI implementations, and is (ironically) not even a flannel issue really. However, its becomes a flannel issue, because the solution is to set a flag on the masquerading rules that are created, which are not in anyone's control except for flannel.
What we need is the ability to apply the NF_NAT_RANGE_PROTO_RANDOM_FULLY flag on the masquerading rules that flannel sets up.
We searched for this issue, and didnt see that anyone had asked for this. We're also unaware of any settings that allow setting this flag today-- if that's possible, please let us know.
Your Environment
This issue was copied from weaveworks/weave#3287
The text was updated successfully, but these errors were encountered: