Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod connectivity problem with ENIConfig/pod-specific subnet #219

Closed
ewbankkit opened this issue Nov 2, 2018 · 25 comments
Closed

Pod connectivity problem with ENIConfig/pod-specific subnet #219

ewbankkit opened this issue Nov 2, 2018 · 25 comments

Comments

@ewbankkit
Copy link
Contributor

Maybe related to #212?
We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space.
A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet.
Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (hostNetwork: true) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address).
I can kubectl exec into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.

$ kubectl --kubeconfig kubeconfig run -i --rm --tty debug --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr AE:94:60:6F:AA:2E  
          inet addr:100.64.x.x  Bcast:100.64.x.x  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:508 (508.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # wget http://10.x.x.x:61678/v1/networkutils-env-settings
Connecting to 10.x.x.x:61678 (10.x.x.x:61678)
networkutils-env-set 100% |*************************************************************************************************************|   105  0:00:00 ETA
/ # wget https://10.y.y.y/
Connecting to 10.y.y.y (10.y.y.y:443)
^C

10.x.x.x is the node's primary IP, 10.y.y.y is one of the EKS control plane ENIs.

This prevents critical components like kube-dns starting

$ kubectl --kubeconfig kubeconfig logs --namespace kube-system kube-dns-d87b74b4f-f5gff kubedns
...
I1102 20:51:07.938774       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E1102 20:51:07.940074       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://172.20.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout
...
@ewbankkit
Copy link
Contributor Author

@liwenwu-amazon I can provide the aws-cni-support.tar.gz via a support ticket.

@liwenwu-amazon
Copy link
Contributor

I think you might run into issue #35 . Can you try to set AWS_VPC_K8S_CNI_EXTERNALSNAT to true and see if your problem is solved? thanks
https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html

@ewbankkit
Copy link
Contributor Author

I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=true:

$ kubectl --kubeconfig kubeconfig describe pod --namespace kube-system aws-node-6v94g
Name:           aws-node-6v94g
Namespace:      kube-system
Node:           ip-10-x-x-x.us-west-2.compute.internal/10.x.x.x
Start Time:     Fri, 02 Nov 2018 15:16:01 -0400
Labels:         controller-revision-hash=3496945906
                k8s-app=aws-node
                pod-template-generation=1
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             10.x.x.x
Controlled By:  DaemonSet/aws-node
Containers:
  aws-node:
    Container ID:   docker://10b1f2669628dd5cb0695160005e40576f765d46fedf1e1e19eab6d813be2a07
...
    Environment:
      AWS_VPC_K8S_CNI_LOGLEVEL:            DEBUG
      MY_NODE_NAME:                         (v1:spec.nodeName)
      WATCH_NAMESPACE:                     kube-system (v1:metadata.namespace)
      AWS_VPC_K8S_CNI_EXTERNALSNAT:        true
      AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG:  true
...

@liwenwu-amazon
Copy link
Contributor

@ewbankkit Do the Pod's subnet and the Node's subnet use same route table?

@ewbankkit
Copy link
Contributor Author

No, I have them in separate route tables - No real reason why I did it that way, just wanted to keep everything as self-contained as possible.

@liwenwu-amazon
Copy link
Contributor

Does Pods of 2 different nodes (which can NOT ping each other) uses same subnet? Also, have you checked the security groups for the Pods of these 2 nodes, are they allowed to communicate to each other?

@ewbankkit
Copy link
Contributor Author

I tried with putting the pod subnets in the same route table as the node subnets and still could not connect to the control plane from the pods. Yes, SGs are wide open.
I cannot ping pod to pod if they are on different nodes (but in the same subnet).

@liwenwu-amazon
Copy link
Contributor

@ewbankkit will it work if Pod's ENIConfig is using same security group as node?

@lnr0626
Copy link

lnr0626 commented Nov 5, 2018

I believe I'm running into this issue as well.

I have a small primary cidr block in the 10.0.0.0/8 range and a (disjoint) larger secondary cidr also in the 10.0.0.0/8 block. The nodes are coming up in the primary cidr block, and have eniconfigs set to use the secondary block, but with the same security groups as the node. All relevant subnets are using the same route table. I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=false, however I believe that is required given some other constraints I have.

pods can communicate with other pods on the same node, however cannot communicate with pods on other nodes or control plane using the cluster IP. In my case they can communicate with an off node proxy server (i.e. curl --proxy works, but curl https:// does not).

My thought is that some of the things I did to try to workaround #212 are allowing the off node communication to work, however it's not fully addressed and so the cluster IPs don't work between nodes. I'm not positive about that though, and could be completely off base.

@liwenwu-amazon
Copy link
Contributor

@lnr0626 , can you try pinging from one pod to another pod on different Node? Does it work?

@ewbankkit
Copy link
Contributor Author

@liwenwu-amazon Same problem if I run the pods in the same SG as the worker nodes.
Interestingly in the flow logs for the Control Plane ENIs I can see ACCEPTs on port 443 from one of the pod IPs but nothing the other way.

@liwenwu-amazon
Copy link
Contributor

@ewbankkit I few more questions:

  • is this a EKS cluster?
  • Are you using EKS optimized AMI?
  • Do you have ec2-net-utils installed?

@liwenwu-amazon
Copy link
Contributor

@ewbankkit for debugging purpose, does it pod-to-pod ping even work if Pod ENIconfig is using exact same SG/subnt as node?

@lnr0626
Copy link

lnr0626 commented Nov 5, 2018

@liwenwu-amazon pinging a pod on a different node does not work.

@liwenwu-amazon
Copy link
Contributor

@lnr0626 can you provide following debugging information
Assuming you are pinging from Pod-a on Node-a to Pod-b on Node-b and both Pod-a and Pod-b are using secondary IPs from eth1 interface (where eth0 is node's primary ENI)

  • collect the output of kubectl describe node <Node-a>
  • collect the output of kubectl describe node <Node-b>
  • What's the Node-a's instance-id and region
  • What's the Node-b's instance-id and region
  • assuming Node-a and Node-b are using same ENIconfig,
    • collect the output of kubectl describe eniconfig <pod's eniconfig>
  • let's collect tcpdump when you ping from Pod-a to Pod-b
    • install tcpdump on node-a
    • start capturing traffic on eth1 of node-a (assuming eth1 is the ENI for Pods)
      • tcpdump -i eth1 -w node_a_eth1.pcap
    • install tcpdump on node-b
    • start capturing on on eth1 of node-b (assuming eth1 is the ENI for Pods)
      • tcpdump -i eth1 -w node_b-eth1.pcap
  • kubectl exec -ti <pod-a> sh
  • ping pod-b's IP for 5 mins
  • collect node-a, node-b snapshot by
    • running /opt/cni/bin/aws-cni-support.sh on both node
    • collecting /var/log/aws-routed-eni/aws-cni-support.tar.gz
  • You can send these outputs to me([email protected]) or attach them to this issue

thanks

@ewbankkit
Copy link
Contributor Author

@liwenwu-amazon

  • Yes, an EKS cluster
  • The AMI is CentOS 7 100% based on the scripts at https:/awslabs/amazon-eks-ami; I have verified that exactly the same issue is occurring with an Ubuntu 16.04 based AMI
  • No, hadn't heard of ec2-net-utils, I'll install it

@liwenwu-amazon
Copy link
Contributor

@ewbankkit Please do NOT install ec2-net-utils.
Can you run same test and provide the debug info I asked from @lnr0626 ?

thanks

@ewbankkit
Copy link
Contributor Author

After much debug to-and-fro it looks like the root cause is the inability to add a default route during IP address assignment:

[ERROR] Failed to increase pool size: failed to setup eni eni-0000000000000000 network: eni network setup: unable to add route 0.0.0.0/0 via 100.64.0.1 table 2: network is unreachable

Using the retry functionality from #223 doesn't resolve.
At the Linux command line the equivalent is

# ip route add default via 100.64.0.1 dev eth1 table 2
RTNETLINK answers: Network is unreachable

although the eth1 link shows as up.
My feeling is that this is caused by a configuration on our base AMI as I have success with v24 of the Amazon EKS Worker AMI based on Amazon Linux 2.

@sdavids13
Copy link

We are running CentOS 7 as well and getting similar issues. Any help would be greatly appreciated, we are running out of options for IP space without this particular feature available. Unfortunately moving to Amazon Linux 2 really isn't an option for us at this point. It does seem suspect that this doesn't work for either CentOS 7 or Ubuntu, @liwenwu-amazon do you have any ideas or could you reach out to someone inside Amazon who maintains the Amazon Linux 2 image to see if they have any ideas?

@liwenwu-amazon
Copy link
Contributor

@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI

[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2

Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI

@moshe0076
Copy link

Hi,

I have hit the same issue. Working in a VPC with 2 CIDR blocks and EKS worker nodes running on the second CIDR block.

The worker nodes use latest Amazon EKS-Optimized Amazon Linux AMI (v25).

Pods on the same node can communicate with no problem, but no communication works between pods on different nodes.

Setting AWS_VPC_K8S_CNI_EXTERNALSNAT as true in the aws-node daemonset solves the problem, but as I need to run on a public subnet with Internat access that's a problem for me.

Is there any other solution?

Thanks!!!

@dovreshef
Copy link

I'm also seeing the same issue where pods in different nodes cannot communicate.

Using latest EKS-Optimized Amazon Linux AMI (v25) as well.

@moshe0076
Copy link

There is a PR for this

#234

Thanks!!!

@lutierigb
Copy link
Contributor

@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI

[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2

Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI

@liwenwu-amazon you forgot the "dev eth1" in the default route command that's why you were able to add it. it was probably added via eth0.

I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn't complain about it but the aws cni plugin doesn't configure any ip addresses on the secondary interfaces.

CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn't have an ip address in that range.

I updated to kernel 4.x and had no issues but I don't think 4 is "officially" supported for CentOS 7.

@ewbankkit
Copy link
Contributor Author

@lutierigb Thanks for digging in to this.
I added code to explicitly add the primary IP on secondary ENIs in #271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants