Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Gateway certificateRef #2474

Closed
rainest opened this issue May 9, 2022 · 6 comments · Fixed by #2580
Closed

Implement Gateway certificateRef #2474

rainest opened this issue May 9, 2022 · 6 comments · Fixed by #2580
Assignees
Labels
area/gateway-api Relating to upstream Kubernetes SIG Networking Gateway API
Milestone

Comments

@rainest
Copy link
Contributor

rainest commented May 9, 2022

Unlike Ingress, certificates for Gateway API Routes are loaded via Gateway configuration, not configuration on individual routes. Our current Gateway implementation does not handle the CertificateRefs field and we do not load certificates for Gateway API Routes.

The spec only provides references to TLS Secrets, which do not include any dedicated field we can populate Kong SNIs from. We need to either:

  • Parse the certificate and add all hostnames present on it as SNIs.
  • Add an SNIs annotation for Secrets.

These are compatible with one another. The annotation can serve as an override. I recommend just using all hostnames pending demand for the ability to omit some hostnames that a certificate is valid for or serve it for additional hostnames that it is not valid for. I do not expect any common use case to require changing the set of matched hostnames.

@rainest rainest added the area/gateway-api Relating to upstream Kubernetes SIG Networking Gateway API label May 9, 2022
@shaneutt shaneutt added this to the Gateway API - Milestone 2 milestone May 9, 2022
@rainest
Copy link
Contributor Author

rainest commented May 17, 2022

This one doesn't seem as possible to fudge with managed gateways as AllowedRoutes. You could add scan the listeners and fill in certificates during Gateway reconciliation, but as the comments note, there are some problems with this:

diff --git a/internal/controllers/gateway/gateway_controller.go b/internal/controllers/gateway/gateway_controller.go
index 0e9ec6cf..c720e825 100644
--- a/internal/controllers/gateway/gateway_controller.go
+++ b/internal/controllers/gateway/gateway_controller.go
@@ -290,6 +290,43 @@ func (r *GatewayReconciler) reconcileUnmanagedGateway(ctx context.Context, log l
                return ctrl.Result{}, r.Status().Update(ctx, pruneGatewayStatusConds(gateway))
        }
 
+       // this probably actually needs to be more complex to account for multiple
+       // references to the same certificate the end result we want from that is
+       // that we assign them multiple SNIs, so we actually need to collect
+       // certificates as we go and append to a list of SNIs before inserting them
+       // all at the end. didn't bother since we don't really have the facilities
+       // to insert SNIs anyway (we assume (correctly for Ingress) we'll only get
+       // them later), hence the pretend annotation injection, which is not
+       // something we want to actually do
+       for _, listener := range gateway.Spec.Listeners {
+               if listener.TLS != nil {
+                       // pretend we actually resolved the SecretObjectReference and put it in certificate instead of cheating
+                       certificate := &corev1.Secret{
+                               ObjectMeta: metav1.ObjectMeta{
+                                       Annotations: map[string]string{
+                                               "konghq.com/certificate-sni": string(*listener.Hostname),
+                                       },
+                               },
+                       }
+                       if err := r.DataplaneClient.UpdateObject(certificate); err != nil {
+                               debug(log, gateway, "failed to update object in data-plane, requeueing")
+                               return ctrl.Result{}, err
+                       }
+               }
+       }
+
+       // at this point, great, we have all our certificates and all of our SNIs!
+       // however, we're about to clobber all our listeners with the
+       // service-derived ones. unfortunately, unlike AllowedRoutes, we can't
+       // simply shove like type TLSConfigs into the appropriate listener
+       // without losing information: listeners can only have a single hostname. We
+       // could insert certificates with the correct hostname upon user update,
+       // but we would then have no way to delete them: after we clobber those
+       // extra listeners and their hostname info are gone. To delete we'd need to
+       // require users add the complete set of listener+cert configs they want
+       // every time they update the gateway (even if they're changing something
+       // else
+
        if !areAllowedRoutesConsistentByProtocol(gateway.Spec.Listeners) {
                return ctrl.Result{}, fmt.Errorf("all listeners for a protocol must use the same AllowedRoutes")
        }

More briefly, loading additional certificates is not compatible with replacing listeners entirely. There are a few approaches I think we could take here:

  1. Ignore this problem, and make certificates/SNI ingest-only, with the latest hostname provided getting precedence and no means of deleting certificates. This is bad and gets real messy with cold starts (new DB-less instances that didn't see all of history) in the picture. Don't do this.
  2. Abuse the fact that you can put multiple certificates in a certificateRef and use Secret annotations to indicate hostnames. This is quite non-standard, ignoring the intended use for multiple certificateRefs (different signing algorithms, renewals, etc. that make sense for multiple certificates on the same hostname), ignoring the Listener.Hostname field altogether (we partially do this already by forcing no-Hostname listeners), and adding a non-standard konghq.com/hostnames annotation for TLS Secrets that only our controller will honor.
  3. Remove or refactor existing behavior. We allow you to create Listeners freely, and preserve all of them with caveats, namely that we will want to overwrite all their ports and require that you only create compatible listeners (we de facto kinda already enforce compatibility by clobbering them). This should work better for AllowedRoutes handling also. This presumably requires a fair amount of refactoring and additional validation, but gets us closer to the expected end state of a proper implementation.
  4. Defer this for now. We already released HTTPRoute without certificate support, so it's status quo there. TLSRoute will be implemented but unusable. This probably makes most sense if we wish to pursue the 3rd option first, but want to get the L4 routes released sooner than later.

@shaneutt re the 3rd option what complications do you know of from the initial Gateway implementation re trying to support (mostly) user-provided Gateway configurations? Should we be okay so long as we're able to dictate the port used? I don't think we'd be able to ensure that a Listen exists for every kong.conf listen, but I'm kinda okay with that, since it's probably a one-off for most people to put at least one Listen in place for everything they've configured elsewhere.

@shaneutt
Copy link
Contributor

To my understanding the only user-provided Gateway configurations we support are the AllowedRoutes restrictions which more or less allow the Gateway owner to refuse certain routes. If I'm understanding correctly your suggestions above the certificate references will be a pure reflection of certificates available for the Kong Gateway Deployment. So I'm not necessarily understanding what we mean by allowing more user-provided Gateway configurations here?

@rainest
Copy link
Contributor Author

rainest commented May 17, 2022

The Kong certificates are a combination of the certificate (the actual PEM cert and key) and SNIs (the hostnames where we'll serve them). Certificates alone aren't of much use.

In the Gateway context, you can only provide a single Hostname (SNI) per Listen, so if we only have, say, 4 Listens, you can't use more than 4 certificates, and realistically we'd use none because our generated Listens have no Hostnames. We can accomodate this with a single Gateway, it just needs to have one Listen for every certificate+SNI pair. This is option 3, and it entails wiping out most of the code we use to generate listeners from Kong's environment. We'd presumably still use some environment information to override user-provided ports.

A Listen TLSConfig can include multiple certificates, but the stated purpose of this is to provide an alternate certificate for a hostname, e.g. you want to provide both RSA and ECDSA certificates. It's not intended to provide one certificate for foo.example and another for bar.example, and there's no way to indicate that they

@rainest
Copy link
Contributor Author

rainest commented May 17, 2022

Per discussion with @scseanchow we plan to defer and implement option 3 after the next release, unless I happen to get it done faster than expected.

@rainest
Copy link
Contributor Author

rainest commented Jun 9, 2022

So for

Remove or refactor existing behavior. We allow you to create Listeners freely, and preserve all of them with caveats, namely that we will want to overwrite all their ports and require that you only create compatible listeners (we de facto kinda already enforce compatibility by clobbering them). This should work better for AllowedRoutes handling also. This presumably requires a fair amount of refactoring and additional validation, but gets us closer to the expected end state of a proper implementation.

I have a draft at #2555. This does not do certificate anything yet, just the refactor above. There are a number of changes and questions of note here:

  • The allowedRoutes equality checks are gone. They are no longer necessary with user-provided Listeners. We can honor all allowedRoutes clauses properly.
  • We now run Listener compatibility checks (though I haven't yet exhaustively reviewed all the conditions/reasons to see if we need them), but only within a single Gateway (during its reconcile). However, our actual behavior is that we combine all Gateways of a given GatewayClass into a single Gateway. Previously this was fine, as our Listeners were guaranteed compatible across Gateways because they were all identical (the same generated Kong config Listener set). That's no longer the case, and while the spec does say "An implementation MAY also group together and collapse compatible Listeners belonging to different Gateways", we technically collapse Listeners from other Gateways regardless of their cross-Gateway compatibility. We need to address this; usage of separate Gateway instances is now here be dragons territory.
  • Attempting to assign multiple L4 listeners to the same port results in a ProtocolConflict reason. Not sure this is correct, but that's more a question for upstream. Doesn't seem like they should be a HostnameConflict.
  • Our current implementation adds both LoadBalancer and ClusterIP addresses to the Gateway and its status. Is this correct? It seems like we should not include the ClusterIP: the address field is optional on Gateway, and the stated purpose of a Gateway is to route outside traffic inside. By definition, the users of the Gateway will not be able to route to the ClusterIP.

@shaneutt
Copy link
Contributor

shaneutt commented Jun 9, 2022

Our current implementation adds both LoadBalancer and ClusterIP addresses to the Gateway and its status. Is this correct? It seems like we should not include the ClusterIP: the address field is optional on Gateway, and the stated purpose of a Gateway is to route outside traffic inside. By definition, the users of the Gateway will not be able to route to the ClusterIP.

It's interesting you bring this up, because I've been considering something similar for the Gateway Operator. As it turns out, I think the main reason for this status currently is so that the kubectl get gateways output has a nice clean view with an IP address where you can reach the Gateway ready to go. But that breaks down for clusters without LoadBalancer type services, or that don't have public access. I've been considering that the context of the IP matters and looking at upstream opportunities to express that context in the API, thus allowing programmatic things on the status like: if ip.IsPublic() { then request(ip) } kind of things. I'm wondering if we need the ability to express that context, perhaps even listing different kinds of IPs separately. I've put an agenda item to pick other maintainers brains about this at the community sync in a couple days because I want to get this one into a better state for the operator as well as KIC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway-api Relating to upstream Kubernetes SIG Networking Gateway API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants