-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sig-api-machinery: Add scale targets to CRDs to GA KEP #1015
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ see-also: | |
- "[Umbrella Issue](https:/kubernetes/kubernetes/issues/58682)" | ||
- "[Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8)" | ||
- "[Pruning for CustomResources KEP](https:/kubernetes/enhancements/pull/709)" | ||
- "[Defaulting for Custom Resources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md)" | ||
--- | ||
|
||
# Title | ||
|
@@ -95,21 +96,21 @@ Bug fixes required to graduate CRDs to GA: | |
|
||
* See “Required for GA” issues tracked via the [CRD Project Board](https:/orgs/kubernetes/projects/28). | ||
|
||
For additional details on already completed features, see the [Umbrella Issue](https:/kubernetes/kubernetes/issues/58682). | ||
For additional details on already completed features, see the [CRD Project Board](https:/orgs/kubernetes/projects/28). | ||
|
||
See [Post-GA tasks](#post-ga-tasks) for decided out-of-scope features. | ||
|
||
### Defaulting and pruning for custom resources is implemented | ||
|
||
Both defaulting and pruning and also read-only validation are blocked by the | ||
OpenAPI subset definition (next point). An update of the [old Pruning for | ||
CustomResources KEP](https:/kubernetes/enhancements/pull/709) and the implementation | ||
([pruning PR](https:/kubernetes/kubernetes/pull/64558), [defaulting | ||
PR](https:/kubernetes/kubernetes/pull/63604)), are follow-ups as soon as unblocked. | ||
OpenAPI subset definition (next point). | ||
|
||
See the [Pruning for CustomResources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20180731-crd-pruning.md) | ||
and the [Defaulting for Custom Resources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md). | ||
|
||
### CRD v1 schemas are restricted to a subset of the OpenAPI specification | ||
|
||
See [Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8) | ||
See [OpenAPI Subset KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-structural-openapi.md) | ||
|
||
### Generator exists for CRD Validation Schema v3 (Kubebuilder) | ||
|
||
|
@@ -121,8 +122,8 @@ to be integrated into kubebuidler’s controller-tools. | |
|
||
### CustomResourceWebhookConversion API is GA ready | ||
|
||
Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 via the | ||
"Graduation Criteria" proposed in [PR #1004](https:/kubernetes/enhancements/pull/1004). | ||
Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 according to the | ||
[CustomResourceDefinition Conversion Webhook's Graduation Criteria](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-crd-conversion-webhook.md#graduation-criteria). | ||
We plan to then graduate this to GA as part of the CRD to GA graduation. | ||
|
||
### CustomResourceSubresources API is GA ready | ||
|
@@ -162,10 +163,109 @@ TODO: complete this list | |
|
||
### Scale Targets for GA | ||
|
||
* TODO quantify: Read/write latency of CRDs within X% of native Kubernetes types | ||
* TODO quantify: Latency degrades less than X% for up to 100k Custom Resources per CRD kind | ||
* TODO quantify: Webhook conversion QPS of a noop converter is within X% of QPS with no webhook | ||
* Coordinate with sig-scalability | ||
The scale targets for GA of custom resources are defined by the same [API call latency | ||
SLIs/SLOs as the Kuberetes native types](https:/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details). | ||
|
||
The targets are defined by the below suggested maximum limits, which are organized the same way as the [Kubernetes native type thresholds](https:/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds), with only one change: | ||
|
||
- Since custom resources can be arbitrarily large, we have broken down the limit by custom resource object size. | ||
|
||
**Custom Resource Definitions:** | ||
|
||
| Suggested Maximum Limit: scope=cluster | | ||
| --- | | ||
| 500 | | ||
|
||
_Note: The Custom Resource Definition suggested maximum limit was selected not | ||
due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, | ||
which is a background process that occurs asychroniously each time a Custom | ||
Resource Definition schema is updated. For 500 Custom Resource Definitions it takes | ||
slightly over 35 seconds for a definition change to be visible via the OpenAPI | ||
spec endpoint._ | ||
|
||
**Custom Resources, Cluster Wide:** | ||
|
||
Cluster wide limits for custom resources are storage bound and custom resources | ||
share the storage space with all other objects. While determining the | ||
appropriate storage limit for a cluster is out-of-scope for this document, once | ||
a etcd storage limit selected, suggested maximum limits for custom resources | ||
are: | ||
|
||
| etcd storage limit | Suggested Maximum Limit: scope=cluster | | ||
| --- | --- | | ||
| 4GB | 40000 | | ||
| 8GB | 80000 | | ||
|
||
These limits aim to keep custom resource storage usage to less than half of the | ||
total cluster storage capacity for custom resources of 50kb or less in size. | ||
|
||
**Custom Resources per Definition:** | ||
|
||
For each custom resource definition, the limit on the number of custom resources | ||
can be found by taking the (median) object size of the custom resource and finding | ||
the the matching row in this table: | ||
|
||
| Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) | | ||
| --- | --- | --- | | ||
| <=10kb | 1500 | 10000 | | ||
| (10kb - 25kb] | 600 | 4000 | | ||
| (25kb - 50kb] | 300 | 2000 | | ||
|
||
The cluster scope indicates the total number of custom resources for that | ||
definition allowed in the entire cluster. | ||
lavalamp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The namespace scope indicates the total number of custom resources for that | ||
definition allowed in any particular namespace. The cumulative count of the | ||
custom resource across all namespaces must not exceed the cluster limit. | ||
|
||
Since, in practice, custom resources scale farther without conversion webhooks | ||
within the SLI/SLOs (roughly 2x according to our scale tests), custom resource | ||
definition authors should be careful to adhere to these limits so that in the | ||
future a webhook converter may safely be added as part of a custom resource | ||
version upgrade. | ||
|
||
_Note: For custom resources of custom resource definitions using `scope: Namespaced`: the scope=namespace | ||
suggested maximum limit indicates how many custom resource objects may be in each namespace, | ||
and the scope=cluster suggested maximum limit indicates how many custom resource objects may | ||
be in the cluster total. For custom resources of custom resource definitions using `scope: Cluster`: only | ||
the scope=cluster suggested maximum limit applies._ | ||
|
||
**Conversion Webhooks:** | ||
|
||
Conversion Webhook SLOs are defined from the perspective of the conversion | ||
webhook. It does not include any api-server serialization/deserialization for | ||
making the request to the webhook, but it does include network latency. | ||
|
||
Given that the performance and scalability of conversion webhooks are the | ||
responsibility of their author, Custom resource scale targets are applied only for | ||
conversion webhooks that are within the following latencies for the above suggested | ||
maximum limits. | ||
jpbetz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| scope | object count limit | Expected conversion Webhook SLO: p99 latency | | ||
| --- | --- | --- | | ||
| resource | 1 | 50ms | | ||
| namespace | 1500 (<=10kb), 600 (10-25kb) or 300 (25-50kb) | 1 seconds | | ||
| cluster | 10000 (<=10kb), 4000 (10-25kb) or 2000 (25-50kb) | 6 seconds | | ||
|
||
The scope=resource's higher "per-object" latency (50ms vs ~1.5ms for namespace | ||
and cluster scope) is to accommodate for a request serving cost constant. | ||
|
||
The above object size and suggested maximum limits in the Custom Resources per | ||
Definition table applies to these conversion webhook SLOs. For example, for a | ||
list request for 1500 custom resource objects that are 10k in size, the resource | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the table trying to say "return a single object in 50ms, up to 1500 10k objects in 1s, 10000 10k objects in 6 seconds"? If so, that mathematically doesn't make much sense, the latter implies that a single object must be processed in .6ms? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've inlined the object count details in the webhook SLOs to make it easier to understand. The 50ms for a single object is higher to account for a request serving constant factor. I've added a note. |
||
scope SLO of 1 second for the conversion webhook applies. | ||
|
||
**Scale Target Data** | ||
|
||
GA custom resource scale targets were selected based on an [analysis of our current scale limits](https://docs.google.com/document/d/1tEstPQvzGvaRnN-WwGUWx1H9xHPRCy_fFcGlgTkB3f8). | ||
|
||
We ran a month long survey of Custom Resource Definition scale needs across Kubernetes mailing lists, slack channels and social media. | ||
Of the custom resource definitions surveyed, 96% are currently within these suggested maximum limits, 91% are within these limits for their anticipated future growth, and survey data provides useful guidance for our post-GA scalability work. See [survey of real-world custom resource usage](https://docs.google.com/document/d/1MTd_gDlpgBaT5sAKM4j6tQVeCFIT9J44RHzt2yWOK_g) for details. | ||
|
||
As part of GA the suggested maximum limits and SLO documentation will be updated to make this clear and to | ||
encourage CRD authors to provide concrete suggested maximum limits and SLIs/SLOs for their custom | ||
resource kinds to their users that account for the per resource conversion cost | ||
of their conversion webhook and/or size of their custom resources. | ||
|
||
## Graduation Criteria | ||
jpbetz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the cluster scope have a longer SLO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to be consistent with how the SLOs for native types are defined (https:/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details). @wojtek-t, do you know the background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's connected with the number of objects - listing objects is basically some constant time + something proportional to number of objects processed. And within single namespace (scope=namespace) we offficially allow much smaller number of objects.