Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conformance tests for GatewayConditionReady #1833

Closed
arkodg opened this issue Mar 16, 2023 · 13 comments
Closed

Add conformance tests for GatewayConditionReady #1833

arkodg opened this issue Mar 16, 2023 · 13 comments
Assignees
Labels
area/conformance priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@arkodg
Copy link
Contributor

arkodg commented Mar 16, 2023

What would you like to be added:
Add an extended conformance test to ensure that when the Gateway Ready Condition is set to True, all traffic sent through this listener must succeed. Here is the condition in the spec.

Relates to #1832 (comment)

Why this is needed:
End users can better understand guarantees provided by implementations.

@shaneutt shaneutt added this to the v0.7.0 milestone Mar 17, 2023
@shaneutt shaneutt added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 17, 2023
@gauravkghildiyal
Copy link
Member

I'd like to work on this.

I'm sorry if I'm missing something obvious and it would be great if someone can help with the following doubt:

So I'm guessing that the GatewayConditionReady could be set to true even before any corresponding Route resources are programmed into the gateway. If that is the case, is there a way to verify the invariant "all traffic sent through this listener must succeed." without being explicitly aware of the implementation of the listener/gateway? Is there a consensus that when GatewayConditionReady=False|Unknown, we would get a connection timeout/refused (or other equivalent) but when GatewayConditionReady=True, the TCP connection would succeed (but could return some other application layer error)

@howardjohn
Copy link
Contributor

I'd like to work on this.

I'm sorry if I'm missing something obvious and it would be great if someone can help with the following doubt:

So I'm guessing that the GatewayConditionReady could be set to true even before any corresponding Route resources are programmed into the gateway. If that is the case, is there a way to verify the invariant "all traffic sent through this listener must succeed." without being explicitly aware of the implementation of the listener/gateway? Is there a consensus that when GatewayConditionReady=False|Unknown, we would get a connection timeout/refused (or other equivalent) but when GatewayConditionReady=True, the TCP connection would succeed (but could return some other application layer error)

That is an excellent point. Ready=False does not imply traffic must fail IMO, just that we aren't 100% certain it will succeed.

However the interaction with route is troubling I think. It can't mean "all routes are ready", IMO, because we wouldn't have anyway to know if it's an outdated condition. Normally we use observedGeneration but this is cross-resource

@arkodg
Copy link
Contributor Author

arkodg commented Mar 18, 2023

maybe a Ready condition is needed at the route level

// RouteConditionType is a type of condition for a route.
so a user can ascertain that a route is ready to serve traffic ?

@youngnick
Copy link
Contributor

A Ready condition at Route level would be great, I agree, but I think that in practice, that will be very hard to build. It will mean that every bit of config from every HTTPRoute will need to be tracked in the data plane, so that an implementation can confirm that a particular piece of config that corresponds to that HTTPRoute has been propagated down to the data plane and is ready to pass traffic.

For Envoy data planes, for example, the controller will need to track which filter chain (at least) corresponds to which HTTPRoute, and ensure that ACKs and NACKs flow backwards through the controller, so that the status of the correct HTTPRoute can be updated at the right time. My experience is this is pretty hard to do right, because xDS is eventually consistent, in that even once a specific update has been ACKed, Envoy will need a (small) amount of time to flip things over. Will this be long enough to notice? Probably not, but we have no way of guaranteeing it, because of the way the Kubernetes eventual consistency lines up with the Envoy xDS consistency checks.

@howardjohn
Copy link
Contributor

I feel like all of those concerns apply to "Ready" at Gateway level though? Which is why we aren't implementing it today - its a Hard Problem. I'd be really interested to see if anyone with an Envoy based implementation is able to do it correctly (and handling NxM data plane/control plane replicas)

From the user perspective, what value does Ready provide (over Programmed) if it doesn' include Routes? A gateway without routes is ~useless

@arkodg
Copy link
Contributor Author

arkodg commented Mar 20, 2023

for the platform admin persona

  • a Gateway listener Ready status will be useful so they can be sure config like tls is setup
    correctly in the data plane

for the app dev persona

  • they might not have access to the Gateway resource
  • the xRoute resource is the only resource that links the Gateway to the Backend, so a Ready condition here will signal to the user whether its safe to send traffic or not via this route.

yes, its a really hard problem, but hoping implementations are incentivized to solve it, since its very useful for end users

@howardjohn
Copy link
Contributor

IMO we should either have Ready for both or neither. A partial state isn't particularly useful

@dprotaso
Copy link
Contributor

I second having a Ready on both the Gateway and HTTPRoute - we need it for the reasons @arkodg mentioned.

@dprotaso
Copy link
Contributor

/assign

@arkodg
Copy link
Contributor Author

arkodg commented Mar 23, 2023

@dprotaso @shaneutt @robscott wouldn't this some change to an existing GEP as well to add a Route Ready condition ?

@dprotaso
Copy link
Contributor

dprotaso commented Mar 27, 2023

@arkodg Yeah I can write something up.

I'm putting this PR/issue on hold until the discussion #1877 is resolved

@arkodg
Copy link
Contributor Author

arkodg commented Mar 27, 2023

thanks for driving this @dprotaso !

@shaneutt shaneutt modified the milestones: v0.7.0, v0.8.0 Apr 6, 2023
@robscott
Copy link
Member

I think we can close this one out now that we've marked "Ready" as reserved for future use: #1888.

@robscott robscott closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/conformance priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
No open projects
Development

No branches or pull requests

7 participants