-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress network policy blocks readiness probe #6476
Comments
A simple deployment with an httpd container and readiness check works together with a NetworkPolicy that disables all egress:
It also works when replacing the deployment by a daemonset which is what I was using for nginx. |
The following values.yaml was used for installing calico. The values.yaml used for nginx installation:
|
The difference between nginx and my example deployment is that nginx has the cap_net_bind capability and uses host ports. Also interesting us that only the first readiness probe fails, but not subsequent ones. |
For completeness, here is the yaml of one of the nginx containers:
|
@ErikEngerd thanks for raising this. I'm looking at it now. I definitely would not expect egress policy to impact a pod's ability to access its own host, perhaps unless you have host endpoints created. I think you missed posting your Calico deployment values.yaml / manifest - do you know if FELIX_DEFAULTENDPOINTTOHOST is set on the calico/node DaemonSet? Normally we set that to "ACCEPT" by default, which ensures this packet path is functioning. |
Calico network policy doesn't reject connections, so it's unlikely this is a result of Calico's network policy dropping the traffic. We blackhole, so the symptom would be a timeout rather than a connection refused. This suggests to me that the nginx pod is actually failing to serve its readiness endpoint for some reason. Do you know if the nginx pod requires egress access in some manner in order to successfully start its readiness endpoint? |
I am using the following calico version
I installed calico using the tigera operator:
Then I applied the following custom resource
|
Looking at the calico daemonset, it appears that the setting is already as you describe:
|
If I apply the following network policy, then the readiness probe fails. This egress rule allows traffic to 192.168.178.0/24 which is the network of the nodes.
|
As far as the readiness probe is concerned, it is a standard HTTP check based on the container port 8081. Excerpt from one of the nginx pods above:
It is using containerPort 8081 which is not a host port like 80 and 443. Or perhaps the implementation of the 8081 healthcheck uses the host port 80 or 443. |
I have done some more experimentation. I think I have figured it out now. The following network policy then works: - Ingress
egress:
|
I think the issue can be closed. The only annoying thing is that network policies don't really allow the use case of allowing egress to the API server without hardcoding IPs in the egress rule, since the api server is not listening on a cluster IP. Would be nice if this could be fixed somehow in standardization. |
@ErikEngerd have you looked at using Services in egress rules? You can use a Calico policy as described here: https://projectcalico.docs.tigera.io/security/service-policy e.g.
|
I am aware that Calico provides much more advanced network policies than the standard but I am a bit reluctant to become dependant on Calico-specific functionaliry. Also, some services such as Gke autopilot do not allow to choose a specific network provider (even though they appear to use calico under the hood). |
I have deployed the nginx ingress controller using helm.
When securing my cluster using network policies, I noticed that the nginx ingress controller was failing its readiness probe. This was also failing with only a single network policy present for egress. This network policy is:
The output of the kubectl describe of the affected pod is
What did you expect to happen?
Upon deleting an nginx pod it should come back and pass its readiness probe. The reason is that it is connecting to its own IP. Even when adding explicit egress rules for TCP port 8081, it still fails.
In general network policies should not affect readiness probes.
How can we reproduce it (as minimally and precisely as possible)?
Wait until nginx is successfully running and all pods are running.
Now create the network policy:
And delete one of the nginx pod.
The pod now fails because of a a readiness probe.
Anything else we need to know?
The network policy is the only network policy that I used for this test. It does not allow communication to any other pods in the system. However, my default network policies did cover egress and ingress rules to the pods that it needs to reach. The example is just a minimal working example.
Kubernetes version
Cloud provider
OS version
Server OS version:
The text was updated successfully, but these errors were encountered: