kubernetes · k8s-ci-robot · Jul 29, 2019 · Apr 26, 2019 · Jul 22, 2019 · Jul 29, 2019
diff --git a/keps/sig-api-machinery/20180415-crds-to-ga.md b/keps/sig-api-machinery/20180415-crds-to-ga.md
@@ -25,6 +25,7 @@ see-also:
  - "[Umbrella Issue](https:/kubernetes/kubernetes/issues/58682)"
  - "[Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8)"
  - "[Pruning for CustomResources KEP](https:/kubernetes/enhancements/pull/709)"
+ - "[Defaulting for Custom Resources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md)"
 ---
 
 # Title
@@ -95,21 +96,21 @@ Bug fixes required to graduate CRDs to GA:
 
 * See “Required for GA” issues tracked via the [CRD Project Board](https:/orgs/kubernetes/projects/28).
 
-For additional details on already completed features, see the [Umbrella Issue](https:/kubernetes/kubernetes/issues/58682).
+For additional details on already completed features, see the [CRD Project Board](https:/orgs/kubernetes/projects/28).
 
 See [Post-GA tasks](#post-ga-tasks) for decided out-of-scope features.
 
 ### Defaulting and pruning for custom resources is implemented
 
 Both defaulting and pruning and also read-only validation are blocked by the
-OpenAPI subset definition (next point). An update of the [old Pruning for
-CustomResources KEP](https:/kubernetes/enhancements/pull/709) and the implementation
-([pruning PR](https:/kubernetes/kubernetes/pull/64558), [defaulting
-PR](https:/kubernetes/kubernetes/pull/63604)), are follow-ups as soon as unblocked.
+OpenAPI subset definition (next point). 
+
+See the [Pruning for CustomResources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20180731-crd-pruning.md)
+and the [Defaulting for Custom Resources KEP](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md).
 
 ### CRD v1 schemas are restricted to a subset of the OpenAPI specification
 
-See [Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8)
+See [OpenAPI Subset KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-structural-openapi.md)
 
 ### Generator exists for CRD Validation Schema v3 (Kubebuilder)
 
@@ -121,8 +122,8 @@ to be integrated into kubebuidler’s controller-tools.
 
 ### CustomResourceWebhookConversion API is GA ready
 
-Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 via the
-"Graduation Criteria" proposed in [PR #1004](https:/kubernetes/enhancements/pull/1004). 
+Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 according to the
+[CustomResourceDefinition Conversion Webhook's Graduation Criteria](https:/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-crd-conversion-webhook.md#graduation-criteria).
 We plan to then graduate this to GA as part of the CRD to GA graduation.
 
 ### CustomResourceSubresources API is GA ready
@@ -162,10 +163,109 @@ TODO: complete this list
 
 ### Scale Targets for GA
 
-* TODO quantify: Read/write latency of CRDs within X% of native Kubernetes types
-* TODO quantify: Latency degrades less than X% for up to 100k Custom Resources per CRD kind
-* TODO quantify: Webhook conversion QPS of a noop converter is within X% of QPS with no webhook
-* Coordinate with sig-scalability
+The scale targets for GA of custom resources are defined by the same [API call latency
+SLIs/SLOs as the Kuberetes native types](https:/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details).
+
+The targets are defined by the below suggested maximum limits, which are organized the same way as the [Kubernetes native type thresholds](https:/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds), with only one change:
+
+- Since custom resources can be arbitrarily large, we have broken down the limit by custom resource object size.
+
+**Custom Resource Definitions:**
+
+| Suggested Maximum Limit: scope=cluster |
+| --- |
+| 500 |
+
+_Note: The Custom Resource Definition suggested maximum limit was selected not
+due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing,
+which is a background process that occurs asychroniously each time a Custom
+Resource Definition schema is updated. For 500 Custom Resource Definitions it takes
+slightly over 35 seconds for a definition change to be visible via the OpenAPI
+spec endpoint._
+
+**Custom Resources, Cluster Wide:**
+
+Cluster wide limits for custom resources are storage bound and custom resources
+share the storage space with all other objects. While determining the
+appropriate storage limit for a cluster is out-of-scope for this document, once
+a etcd storage limit selected, suggested maximum limits for custom resources
+are:
+
+| etcd storage limit | Suggested Maximum Limit: scope=cluster |
+| --- | --- |
+| 4GB | 40000 |
+| 8GB | 80000 |
+
+These limits aim to keep custom resource storage usage to less than half of the
+total cluster storage capacity for custom resources of 50kb or less in size.
+
+**Custom Resources per Definition:**
+
+For each custom resource definition, the limit on the number of custom resources
+can be found by taking the (median) object size of the custom resource and finding
+the the matching row in this table:
+
+| Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) |
+| --- | --- | --- |
+| <=10kb | 1500 | 10000 |
+| (10kb - 25kb] | 600 | 4000 |
+| (25kb - 50kb] | 300 | 2000 |
+
+The cluster scope indicates the total number of custom resources for that
+definition allowed in the entire cluster.
+
+The namespace scope indicates the total number of custom resources for that
+definition allowed in any particular namespace. The cumulative count of the
+custom resource across all namespaces must not exceed the cluster limit.
+
+Since, in practice, custom resources scale farther without conversion webhooks
+within the SLI/SLOs (roughly 2x according to our scale tests), custom resource
+definition authors should be careful to adhere to these limits so that in the
+future a webhook converter may safely be added as part of a custom resource
+version upgrade.
+
+_Note: For custom resources of custom resource definitions using `scope: Namespaced`: the scope=namespace
+suggested maximum limit indicates how many custom resource objects may be in each namespace,
+and the scope=cluster suggested maximum limit indicates how many custom resource objects may
+be in the cluster total. For custom resources of custom resource definitions using `scope: Cluster`: only
+the scope=cluster suggested maximum limit applies._
+
+**Conversion Webhooks:**
+
+Conversion Webhook SLOs are defined from the perspective of the conversion
+webhook. It does not include any api-server serialization/deserialization for
+making the request to the webhook, but it does include network latency.
+
+Given that the performance and scalability of conversion webhooks are the
+responsibility of their author, Custom resource scale targets are applied only for
+conversion webhooks that are within the following latencies for the above suggested
+maximum limits.
+
+| scope | object count limit | Expected conversion Webhook SLO: p99 latency |
+| --- | --- | --- |
+| resource | 1 | 50ms |
+| namespace | 1500 (<=10kb), 600 (10-25kb) or 300 (25-50kb) | 1 seconds |
+| cluster | 10000 (<=10kb), 4000 (10-25kb) or 2000 (25-50kb) | 6 seconds |
+
+The scope=resource's higher "per-object" latency (50ms vs ~1.5ms for namespace
+and cluster scope) is to accommodate for a request serving cost constant.
+
+The above object size and suggested maximum limits in the Custom Resources per
+Definition table applies to these conversion webhook SLOs. For example, for a
+list request for 1500 custom resource objects that are 10k in size, the resource
+scope SLO of 1 second for the conversion webhook applies.
+
+**Scale Target Data**
+
+GA custom resource scale targets were selected based on an [analysis of our current scale limits](https://docs.google.com/document/d/1tEstPQvzGvaRnN-WwGUWx1H9xHPRCy_fFcGlgTkB3f8).
+
+We ran a month long survey of Custom Resource Definition scale needs across Kubernetes mailing lists, slack channels and social media.
+Of the custom resource definitions surveyed, 96% are currently within these suggested maximum limits, 91% are within these limits for their anticipated future growth, and survey data provides useful guidance for our post-GA scalability work. See [survey of real-world custom resource usage](https://docs.google.com/document/d/1MTd_gDlpgBaT5sAKM4j6tQVeCFIT9J44RHzt2yWOK_g) for details.
+
+As part of GA the suggested maximum limits and SLO documentation will be updated to make this clear and to
+encourage CRD authors to provide concrete suggested maximum limits and SLIs/SLOs for their custom
+resource kinds to their users that account for the per resource conversion cost
+of their conversion webhook and/or size of their custom resources.
 
 ## Graduation Criteria