Skip to content

Commit

Permalink
updated to the latest template
Browse files Browse the repository at this point in the history
  • Loading branch information
SergeyKanzhelev committed Feb 7, 2023
1 parent e2ff312 commit 619f280
Showing 1 changed file with 111 additions and 15 deletions.
126 changes: 111 additions & 15 deletions keps/sig-node/2727-grpc-probe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Alternative Considerations](#alternative-considerations)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
Expand All @@ -26,8 +32,10 @@
- [Alpha](#alpha-1)
- [Beta](#beta-1)
- [GA](#ga-1)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [References](#references)
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
<!-- /toc -->


Expand All @@ -52,17 +60,35 @@
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Goals
## Summary

Add gRPC probe to Pod.Spec.Container.{Liveness,Readiness,Startup}Probe.

## Motivation

gRPC is wide spread RPC framework. Existing solutions to add
probes to gRPC apps like exposing additional http endpoint
for health checks or packing external gRPC client as part of
an image and use exec probes have many limitations and overhead.

Many load balancers support gRPC natively so adding it to
Kubernetes aligns well with the industry.

Finally, Kubernetes project actively uses gRPC so adding built-in
support for gRPC endpoints does not introduce any new dependencies
to the project.

### Goals

Enable gRPC probe natively from Kubelet without requiring users to package a
gRPC healthcheck binary with their container.

- https:/grpc-ecosystem/grpc-health-probe
- https:/grpc/grpc/blob/master/doc/health-checking.md

## Non-Goals
### Non-Goals

Add gRPC support in other areas of K8s (e.g. Services).
- Add gRPC support in other areas of K8s (e.g. Services).

## Proposal

Expand Down Expand Up @@ -141,11 +167,6 @@ Note that `GRPCAction.Port` is an int32, which is inconsistent with
the other existing probe definitions. This is on purpose -- we want to
move users away from using the (portNum, portName) union type.

### Test Plan

- Unit test: Add unit tests to `pkg/kubelet/prober/...`
- e2e: Add test case and conformance test to `e2e/common/node/container_probe.go`.

### Alternative Considerations

Note that `readinessProbe.grpc.service` may be confusing, some
Expand All @@ -158,6 +179,47 @@ alternatives considered:

There were no feedback on the selected name being confusing in the context of a probe definition.

### Test Plan

<!--
**Note:** *Not required until targeted at a release.*
The goal is to ensure that we don't accept enhancements with inadequate testing.
All code is expected to have adequate tests (eventually with coverage
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
when drafting this test plan.
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
-->

[X] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

<!--
Based on reviewers feedback describe what additional tests need to be added prior
implementing this enhancement to ensure the enhancements have also solid foundations.
-->

##### Unit tests

- `k8s.io/kubernetes/pkg/probe/grpc`: `2023/02/06` - `78.1%`

##### Integration tests

N/A, only unit tests and e2e coverage.

##### e2e tests

Tests in `test/e2e/common/node/container_probe.go`:

- should *not* be restarted with a GRPC liveness probe: [results](https://storage.googleapis.com/k8s-triage/index.html?test=Probing%20container%20should%20%5C*not%5C*%20be%20restarted%20with%20a%20GRPC%20liveness%20probe)
- should be restarted with a GRPC liveness probe: [results](https://storage.googleapis.com/k8s-triage/index.html?test=should%20be%20restarted%20with%20a%20GRPC%20liveness%20probe)

TODO: stress test to validate the scale (see GA requirements).

### Graduation Criteria

#### Alpha
Expand All @@ -177,13 +239,14 @@ Depending on skew strategy:

#### GA

- Address feedback from beta usage
- Validate that API is appropriate for users. There are some potential tunables:
- [X] Address feedback from beta usage
- [X] Validate that API is appropriate for users. There are some potential tunables:
- `User-Agent`
- connect timeout
- protocol (HTTP, QUIC)
- Close on any remaining open issues & bugs
- Promote tests to conformance
- [ ] Close on any remaining open issues & bugs
- [ ] Promote tests to conformance
- [ ] Implement a stress test

### Upgrade / Downgrade Strategy

Expand Down Expand Up @@ -316,8 +379,27 @@ The overhead of executing probes is consistent with other probe types.
We expect decrease of disk, RAM, and CPU use for many scenarios where the https:/grpc-ecosystem/grpc-health-probe
was used to probe gRPC endpoints.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Yes, gRPC probes use node resources to establish connection.
This may lead to issue like [kubernetes/kubernetes#89898](https:/kubernetes/kubernetes/issues/89898).

The node resources for gRPC probes can be exhausted by a Pod with HostPort
making many connections to different destinations or any other process on a node.
This problem cannot be addressed generically.

However, the design where node resources are being used for gRPC probes works
for the most setups. The default pods maximum is `110`. There are currently
no limits on number of containers. The number of containers is limited by the
amount of resources requested by these containers. With the fix limiting
the `TIME_WAIT` for the socket to 1 second,
[this calculation](https:/kubernetes/kubernetes/issues/89898#issuecomment-1383207322)
demonstrates it will be hard to reach the limits on sockets.

### Troubleshooting

Logs and Pod events can be used to troubleshoot probe failures.

###### How does this feature react if the API server and/or etcd is unavailable?

No dependency on etcd availability.
Expand Down Expand Up @@ -354,10 +436,24 @@ Feature is promoted to beta in 1.24.

Feature is promoted to GA in 1.27.

## Drawbacks

<!--
Why should this KEP _not_ be implemented?
-->

## Alternatives

* 3rd party solutions like https:/grpc-ecosystem/grpc-health-probe

## References

* GRPC healthchecking: https:/grpc/grpc/blob/master/doc/health-checking.md

## Infrastructure Needed (Optional)

<!--
Use this section if you need things from the project/SIG. Examples include a
new subproject, repos requested, or GitHub details. Listing these here allows a
SIG to get the process for these resources started right away.
-->

0 comments on commit 619f280

Please sign in to comment.