-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod connectivity problem with ENIConfig/pod-specific subnet #219
Comments
@liwenwu-amazon I can provide the aws-cni-support.tar.gz via a support ticket. |
I think you might run into issue #35 . Can you try to set AWS_VPC_K8S_CNI_EXTERNALSNAT to true and see if your problem is solved? thanks |
I am running with $ kubectl --kubeconfig kubeconfig describe pod --namespace kube-system aws-node-6v94g
Name: aws-node-6v94g
Namespace: kube-system
Node: ip-10-x-x-x.us-west-2.compute.internal/10.x.x.x
Start Time: Fri, 02 Nov 2018 15:16:01 -0400
Labels: controller-revision-hash=3496945906
k8s-app=aws-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.x.x.x
Controlled By: DaemonSet/aws-node
Containers:
aws-node:
Container ID: docker://10b1f2669628dd5cb0695160005e40576f765d46fedf1e1e19eab6d813be2a07
...
Environment:
AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG
MY_NODE_NAME: (v1:spec.nodeName)
WATCH_NAMESPACE: kube-system (v1:metadata.namespace)
AWS_VPC_K8S_CNI_EXTERNALSNAT: true
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: true
... |
@ewbankkit Do the Pod's subnet and the Node's subnet use same route table? |
No, I have them in separate route tables - No real reason why I did it that way, just wanted to keep everything as self-contained as possible. |
Does Pods of 2 different nodes (which can NOT ping each other) uses same subnet? Also, have you checked the security groups for the Pods of these 2 nodes, are they allowed to communicate to each other? |
I tried with putting the pod subnets in the same route table as the node subnets and still could not connect to the control plane from the pods. Yes, SGs are wide open. |
@ewbankkit will it work if Pod's ENIConfig is using |
I believe I'm running into this issue as well. I have a small primary cidr block in the 10.0.0.0/8 range and a (disjoint) larger secondary cidr also in the 10.0.0.0/8 block. The nodes are coming up in the primary cidr block, and have eniconfigs set to use the secondary block, but with the same security groups as the node. All relevant subnets are using the same route table. I am running with pods can communicate with other pods on the same node, however cannot communicate with pods on other nodes or control plane using the cluster IP. In my case they can communicate with an off node proxy server (i.e. curl --proxy works, but curl https:// does not). My thought is that some of the things I did to try to workaround #212 are allowing the off node communication to work, however it's not fully addressed and so the cluster IPs don't work between nodes. I'm not positive about that though, and could be completely off base. |
@lnr0626 , can you try |
@liwenwu-amazon Same problem if I run the pods in the same SG as the worker nodes. |
@ewbankkit I few more questions:
|
@ewbankkit for debugging purpose, does it pod-to-pod ping even work if Pod ENIconfig is using exact same SG/subnt as node? |
@liwenwu-amazon pinging a pod on a different node does not work. |
@lnr0626 can you provide following debugging information
thanks |
|
@ewbankkit Please do NOT install thanks |
After much debug to-and-fro it looks like the root cause is the inability to add a default route during IP address assignment:
Using the retry functionality from #223 doesn't resolve. # ip route add default via 100.64.0.1 dev eth1 table 2
RTNETLINK answers: Network is unreachable although the |
We are running CentOS 7 as well and getting similar issues. Any help would be greatly appreciated, we are running out of options for IP space without this particular feature available. Unfortunately moving to Amazon Linux 2 really isn't an option for us at this point. It does seem suspect that this doesn't work for either CentOS 7 or Ubuntu, @liwenwu-amazon do you have any ideas or could you reach out to someone inside Amazon who maintains the Amazon Linux 2 image to see if they have any ideas? |
@sdavids13
Sure, i will check with our Amazon Linux engineers on why |
Hi, I have hit the same issue. Working in a VPC with 2 CIDR blocks and EKS worker nodes running on the second CIDR block. The worker nodes use latest Amazon EKS-Optimized Amazon Linux AMI (v25). Pods on the same node can communicate with no problem, but no communication works between pods on different nodes. Setting AWS_VPC_K8S_CNI_EXTERNALSNAT as true in the aws-node daemonset solves the problem, but as I need to run on a public subnet with Internat access that's a problem for me. Is there any other solution? Thanks!!! |
I'm also seeing the same issue where pods in different nodes cannot communicate. Using latest EKS-Optimized Amazon Linux AMI (v25) as well. |
There is a PR for this Thanks!!! |
@liwenwu-amazon you forgot the "dev eth1" in the default route command that's why you were able to add it. it was probably added via eth0. I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn't complain about it but the aws cni plugin doesn't configure any ip addresses on the secondary interfaces. CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn't have an ip address in that range. I updated to kernel 4.x and had no issues but I don't think 4 is "officially" supported for CentOS 7. |
@lutierigb Thanks for digging in to this. |
Maybe related to #212?
We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space.
A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet.
Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (
hostNetwork: true
) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address).I can
kubectl exec
into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.10.x.x.x
is the node's primary IP,10.y.y.y
is one of the EKS control plane ENIs.This prevents critical components like
kube-dns
startingThe text was updated successfully, but these errors were encountered: