Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kic2.0] Fix KIC image deployment failure #1486 #1487

Closed
wants to merge 25 commits into from
Closed

Conversation

ccfishk
Copy link
Contributor

@ccfishk ccfishk commented Jul 4, 2021

What this PR does / why we need it:

Able to deploy kong ingress controller into the cluster namespace using a yaml and controller is able to run and function as expected.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

#1480
#1421
#1256

Special notes for your reviewer:

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR
  • [] able to run performance/integration tests with incluster deployment
  • [] all manifests and related service accounts have proper RBAC roles associated, none of them use ClusterAdmin permissions

@ccfishk ccfishk requested a review from a team as a code owner July 4, 2021 01:19
@ccfishk ccfishk temporarily deployed to Configure ci July 4, 2021 01:19 Inactive
@codecov
Copy link

codecov bot commented Jul 4, 2021

Codecov Report

Merging #1487 (7973c50) into next (54cdf57) will decrease coverage by 0.11%.
The diff coverage is 9.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##             next    #1487      +/-   ##
==========================================
- Coverage   51.39%   51.27%   -0.12%     
==========================================
  Files          91       91              
  Lines        6300     6342      +42     
==========================================
+ Hits         3238     3252      +14     
- Misses       2768     2796      +28     
  Partials      294      294              
Flag Coverage Δ
integration-test 48.48% <9.52%> (+0.12%) ⬆️
unit-test 38.55% <0.00%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
railgun/internal/ctrlutils/utils.go 57.33% <0.00%> (-28.95%) ⬇️
railgun/pkg/config/config.go 82.00% <20.00%> (-10.95%) ⬇️
railgun/cmd/rootcmd/run.go 25.00% <33.33%> (+5.00%) ⬆️
pkg/parser/parser.go 83.26% <0.00%> (-1.26%) ⬇️
railgun/internal/ctrlutils/ingress-status.go 64.65% <0.00%> (+3.01%) ⬆️
railgun/internal/diagnostics/server.go 78.78% <0.00%> (+21.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54cdf57...7973c50. Read the comment docs.

@ccfishk ccfishk self-assigned this Jul 4, 2021
@ccfishk ccfishk added area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. bug Something isn't working size/small labels Jul 4, 2021
@ccfishk ccfishk temporarily deployed to Configure ci July 4, 2021 06:46 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 4, 2021 07:04 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 4, 2021 23:09 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 4, 2021 23:09 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 5, 2021 03:13 Inactive
@ccfishk ccfishk changed the title [WIP][Kic2.0] kic image deployment failure #1486 [Kic2.0] kic image deployment failure #1486 Jul 5, 2021
@ccfishk ccfishk temporarily deployed to Configure ci July 5, 2021 03:32 Inactive
@ccfishk ccfishk changed the title [Kic2.0] kic image deployment failure #1486 [Kic2.0] Fix kic image deployment failure #1486 Jul 5, 2021
@rainest
Copy link
Contributor

rainest commented Jul 6, 2021

Do we know where the attempt to edit controller-token-tdslg is coming from? I haven't encountered that before and am not sure where/why the controller would need to edit a Secret. I'm not clear on the link between it and the changes proposed here.

Historically we haven't had any system to automatically infer the admin API URL from a Service or similar, and hadn't had any plans to add one, as directly providing a URL has been sufficient. Can you elaborate on why you added it? The proposal here makes several decisions we wouldn't want (always using HTTP and sending traffic through an Ingress rather than to a Service over the intra-cluster network) that the URL avoids.

Manifests we planned to handle in #1256, though that's more about building out new kustomize helpers than just providing a manifest--the existing 1.x manifests should work fine for now.

railgun/deployment/kic-deployment.template.yaml Outdated Show resolved Hide resolved
railgun/internal/ctrlutils/utils.go Outdated Show resolved Hide resolved
railgun/internal/ctrlutils/utils.go Outdated Show resolved Hide resolved
railgun/internal/ctrlutils/utils.go Outdated Show resolved Hide resolved
railgun/internal/ctrlutils/utils.go Outdated Show resolved Hide resolved
railgun/pkg/config/config.go Outdated Show resolved Hide resolved
railgun/pkg/config/config.go Outdated Show resolved Hide resolved
railgun/pkg/config/config.go Outdated Show resolved Hide resolved
railgun/pkg/config/config.go Outdated Show resolved Hide resolved
railgun/pkg/config/config.go Outdated Show resolved Hide resolved
@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 6, 2021

#1256

Do we know where the attempt to edit controller-token-tdslg is coming from? I haven't encountered that before and am not sure where/why the controller would need to edit a Secret. I'm not clear on the link between it and the changes proposed here.

Historically we haven't had any system to automatically infer the admin API URL from a Service or similar, and hadn't had any plans to add one, as directly providing a URL has been sufficient. Can you elaborate on why you added it? The proposal here makes several decisions we wouldn't want (always using HTTP and sending traffic through an Ingress rather than to a Service over the intra-cluster network) that the URL avoids.

Manifests we planned to handle in #1256, though that's more about building out new kustomize helpers than just providing a manifest--the existing 1.x manifests should work fine for now.

Please ignore the specific error messages (sorry for that example details) since different ones pop up when debugging deployment. The purpose of this PR is to unblock controller manager in cluster deployment with appropriate configuration which is illustrated in the yaml and readme file.

The points of the Kong Admin API URL is to remove hardcode ip address (publishStatusAddress) and leverage environment variable retrieving service information. Understand the previous configuration due to controller and proxy are in the same namespace and same pods. Now, we have the flexibility to de-couple them in 2.0

@shaneutt shaneutt dismissed their stale review July 7, 2021 18:28

Dismissing my review to unblock: @mflendrich needs to review here as there is duplication with his recent PR and they need to be coordinated.

@shaneutt shaneutt requested review from mflendrich and removed request for shaneutt July 7, 2021 18:28
@rainest rainest mentioned this pull request Jul 7, 2021
1 task
@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 20:26 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 20:26 Inactive
@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 7, 2021

unblock controller manager in cluster deployment with appropriate configuration which is illustrated in the yaml and readme file.

I was trying to understand what issue we had there though--if there is some error that appears, we'd want to understand why it occurs to ensure we're adding a fix. I wasn't aware of any, as when I test the controller 2.0.0-alpha.1 image with the current chart and the all-in-one manifests we have in https:/Kong/kubernetes-ingress-controller/tree/main/deploy/single it does deploy successfully. I am aware of issues with the RBAC config and lack of UDPIngress in the existing examples, but I don't think they cause failures (we just can't update finalizers and have to disable UDPIngress), and we planned to update that elsewhere (#1256 and Kong/charts#364). The rest of the manifests that we had for 1.x do still work with 2.x.

The points of the Kong Admin API URL is to remove hardcode ip address (publishStatusAddress) and leverage environment variable retrieving service information. Understand the previous configuration due to controller and proxy are in the same namespace and same pods.

This sounds related to the work in #1451, correct? fe24a71 should have already given us what we need there. publishStatusAddress should remain: it serves as a user override for Ingress status addresses when they know there network topology doesn't let the controller effectively determine the correct address information (similar to the Service clusterIP setting), as an alternative to --publish-service.

That doesn't have any interaction with the admin API, as we're only concerned with the proxy for Ingress status. We shouldn't need any additional admin API logic--while building configuration from a Service instead is viable, it doesn't really give us anything obvious over requesting a URL instead (since we don't need to look up endpoints or obtain other detailed information about the Service), and our documentation all already refers to the existing --kong-admin-url flag.

unblock controller manager in cluster deployment with appropriate configuration which is illustrated in the yaml and readme file.

I was trying to understand what issue we had there though--if there is some error that appears, we'd want to understand why it occurs to ensure we're adding a fix. I wasn't aware of any, as when I test the controller 2.0.0-alpha.1 image with the current chart and the all-in-one manifests we have in https:/Kong/kubernetes-ingress-controller/tree/main/deploy/single it does deploy successfully. I am aware of issues with the RBAC config and lack of UDPIngress in the existing examples, but I don't think they cause failures (we just can't update finalizers and have to disable UDPIngress), and we planned to update that elsewhere (#1256 and Kong/charts#364). The rest of the manifests that we had for 1.x do still work with 2.x.

The points of the Kong Admin API URL is to remove hardcode ip address (publishStatusAddress) and leverage environment variable retrieving service information. Understand the previous configuration due to controller and proxy are in the same namespace and same pods.

This sounds related to the work in #1451, correct? fe24a71 should have already given us what we need there. publishStatusAddress should remain: it serves as a user override for Ingress status addresses when they know there network topology doesn't let the controller effectively determine the correct address information (similar to the Service clusterIP setting), as an alternative to --publish-service.

That doesn't have any interaction with the admin API, as we're only concerned with the proxy for Ingress status. We shouldn't need any additional admin API logic--while building configuration from a Service instead is viable, it doesn't really give us anything obvious over requesting a URL instead (since we don't need to look up endpoints or obtain other detailed information about the Service), and our documentation all already refers to the existing --kong-admin-url flag.

Agree that both publish-service and PublicStatusAddress should retain as it is, which was covered by #1509 and #1451. It is unrelated to the KIC individual deployment yaml

KIC does need KongAdminAPI for configuration communication,
https:/Kong/kubernetes-ingress-controller/blob/main/railgun/manager/run.go#L96

Current configuration way https:/Kong/kubernetes-ingress-controller/pull/1500/files#L90 assumes all modules are in the same node and having static port (8444), this will break if we deploy controller and proxy separately. The function retrieve Admin ip and port through namespace and name configuration. https:/Kong/kubernetes-ingress-controller/pull/1487/files#diff-786a1651aa544e8c2f84a32b474758ecb0f8e6c5d9d745b5f643d3d786bc57c4R152

@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 21:54 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 21:54 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 23:54 Inactive
@ccfishk ccfishk temporarily deployed to Configure ci July 7, 2021 23:54 Inactive
@ccfishk ccfishk requested a review from a team July 8, 2021 03:23
@rainest
Copy link
Contributor

rainest commented Jul 8, 2021

The assumptions in the various all-in-one YAMLs are by design: those are intentionally opinionated configurations for a select few common configurations (OSS or Enterprise, either with or without a database). They use a single Pod with one Kong container and a controller container because that topology is simple to manage, suitable for many users (especially those that don't have a known reason to use some other topology), and has security benefits for OSS (placing the controller in the same Pod means you don't need to expose the admin API, which is a simple means of securing it without RBAC).

If you want another topology, we currently instruct you to use the chart or to build your own manifests by hand. That stems from a combination of historical reasons (when we started providing manifests, Helm was fairly mature and both Kustomize and Operators were new) and capacity (we have limited resources to maintain deployment tools, so have not built out more complex Kustomize manifests or a non-Helm Operator). We don't provide plain manifests for topologies other than single-Pod because there are so many of them (you can basically mix and match any of the ~12 options covered in the chart documentation, and even the chart doesn't commit to handling all topologies in their entirety--many require multiple releases).

That all said, I still don't think we gain anything by building admin API information from a Service rather than providing a URL, and introduces new problems (which aren't solved here yet):

  • Using a Service name/namespace doesn't remove the requirement to indicate where the Service resides when it's not local (you're providing namespace/svc-name instead of K8S DNS svc-name.namespace for the URL hostname).
  • Using a Service requires a code solution for determining the protocol and appropriate port, which are currently encoded directly in the URL. This would likely require additional constraints on the Service (we'd need a known name scheme for the HTTP and HTTPS admin ports).
  • Using a Service is not compatible with the localhost admin API approach, which we want to support for its simple security model.
  • Backwards compatibility with the existing URL flag requires that we add code to negotiate between the two options.

IMO we'd only want to add the Service option if we had some functionality that required interactions with the Kubernetes API (e.g. we want to get endpoint information or need to edit its metadata for some reason), and I don't think we have any such functionality proposed. Between that and the other points above, I'd recommend we close this and continue the manifest changes separately in #1500 and other issues attached to #1256.

@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 8, 2021

#1500

The assumptions in the various all-in-one YAMLs are by design: those are intentionally opinionated configurations for a select few common configurations (OSS or Enterprise, either with or without a database). They use a single Pod with one Kong container and a controller container because that topology is simple to manage, suitable for many ##users (especially those that don't have a known reason to use some other topology), and has security benefits for OSS (placing the controller in the same Pod means you don't need to expose the admin API, which is a simple means of securing it without RBAC).

[Jimin] No doubt about all-in-one benefits - simplicity. Stick to design, the Kong Admin TLS option https:/Kong/charts/blob/main/charts/kong/values.yaml#L117 provides the scalability to allow other components access the service either in cluster or out of the cluster with security. Also, from standardization point of view, each component/service/controller deploy separately provides the topology management flexibility, rather than simplicity.

If you want another topology, we currently instruct you to use the chart or to build your own manifests by hand. That stems from a combination of historical reasons (when we started providing manifests, Helm was ###fairly mature and both Kustomize and Operators were new) and capacity (we have limited resources to maintain deployment tools, so have not built out more complex Kustomize manifests or a non-Helm Operator). We don't provide plain manifests for topologies other than single-Pod because there are so many of them (you can basically mix and match any of the ~12 options covered in the chart documentation, and even the chart doesn't commit to handling all topologies in their entirety--many require multiple releases).

[Jimin] OK. ATM, most users are using HELM style deployment, users might not be familiar with kubectl tool deployment. I think the kubectl apply -f resource.yaml tool gives user the options to modify resources (namespace etc) on the fly - easy. Yes, we also need consider the chart compitablity.

That all said, I still don't think we gain anything by building admin API information from a Service rather than providing a URL, and introduces new problems (which aren't solved here yet):

[Jimin] Either from service, or URL, the point is to avoid hardcode configuration - localhost:8444; 127.0.0.1:8001. Need also use the deployment configuration if there is .

  • Using a Service name/namespace doesn't remove the requirement to indicate where the Service resides when it's not local (you're providing namespace/svc-name instead of K8S DNS svc-name.namespace for the URL hostname).

[Jimin] We can use svc-name.namespace if the Admin is deployed from a cloud provider with valid HostName as external IP. Thats a valid point - both IP and HostName. Will change accordingly.

  • Using a Service requires a code solution for determining the protocol and appropriate port, which are currently encoded directly in the URL. This would likely require additional constraints on the Service (we'd need a known name scheme for the HTTP and HTTPS admin ports).
    [Jimin] Yes, we need the function ConfigService to discover what protocol is supported, and thus use it accordingly.
  • Using a Service is not compatible with the localhost admin API approach, which we want to support for its simple security model.
  • Backwards compatibility with the existing URL flag requires that we add code to negotiate between the two options.

[Jimin] we do provide well-defined compatibility https:/Kong/kubernetes-ingress-controller/pull/1487/files#diff-777c5b378220515f6b692537382429104081d22a761062c7e2f0919e6c0d1757R221
If in-cluster deployment does not specify any configuration (very likely use the one POD topology), the default localhost will be maintained.

IMO we'd only want to add the Service option if we had some functionality that required interactions with the Kubernetes >API (e.g. we want to get endpoint information or need to edit its metadata for some reason), and I don't think we have any >such functionality proposed. Between that and the other points above, I'd recommend we close this and continue the > >manifest changes separately in #1500 and other issues attached to #1256.

[Jimin] I think this PR and #1500 provides different solutions to different use cases which direct us to k8s standardization. I would suggest to de-duplication and merge them accordingly.

@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 8, 2021

Had offline discussion with @mflendrich, also @rainest's input - users get used to helm charts for deployment mainly atm.
We'd better fix helm charts deploying proxy and controller separately, and use this yaml deployment as a backup for who get used to kubectl tool and our developer folks.

Change the PR to WIP.

@ccfishk ccfishk changed the title [Kic2.0] Fix kic image deployment failure #1486 WIP [Kic2.0] Fix KIC image deployment failure #1486 Jul 9, 2021
@ccfishk ccfishk changed the title WIP [Kic2.0] Fix KIC image deployment failure #1486 [Kic2.0] Fix KIC image deployment failure #1486 Jul 9, 2021
@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 9, 2021

unblock controller manager in cluster deployment with appropriate configuration which is illustrated in the yaml and readme file.

I was trying to understand what issue we had there though--if there is some error that appears, we'd want to understand why it occurs to ensure we're adding a fix. I wasn't aware of any, as when I test the controller 2.0.0-alpha.1 image with the current chart and the all-in-one manifests we have in https:/Kong/kubernetes-ingress-controller/tree/main/deploy/single it does deploy successfully. I am aware of issues with the RBAC config and lack of UDPIngress in the existing examples, but I don't think they cause failures (we just can't update finalizers and have to disable UDPIngress), and we planned to update that elsewhere (#1256 and Kong/charts#364). The rest of the manifests that we had for 1.x do still work with 2.x.

[Jimin] @rainest here is the answer to the question "what issue we had there", update 1.3.x all-in-one as a base
https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/master/deploy/single/all-in-one-dbless.yaml
or using https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/e1846684cc23ae5d7530fcb16f4545773908ccc5/deploy/single/all-in-one-postgres.yaml
w/ updating udpingress (CRD, and updating following

  • apiGroups:
    • configuration.konghq.com
      resources:
    • udpingresses/status
      verbs:
    • update)

deployment throws following following error messages

time="2021-07-09T01:19:57Z" level=error msg="Reconciler error" error="secrets "default-token-m5gft" is forbidden: User "system:serviceaccount:kong:kong-serviceaccount" cannot update resource "secrets" in API group "" in the namespace "metallb-system"" logger=controller-runtime.manager.controller.secret name=default-token-m5gft namespace=metallb-system reconciler group= reconciler kind=Secret

time="2021-07-09T01:33:32Z" level=error msg="Reconciler error" error="endpoints "rancher.io-local-path" is forbidden: User "system:serviceaccount:kong:kong-serviceaccount" cannot update resource "endpoints" in API group "" in the namespace "local-path-storage"" logger=controller-runtime.manager.controller.endpoints name=rancher.io-local-path namespace=local-path-storage reconciler group= reconciler kind=Endpoints

The points of the Kong Admin API URL is to remove hardcode ip address (publishStatusAddress) and leverage environment variable retrieving service information. Understand the previous configuration due to controller and proxy are in the same namespace and same pods.

This sounds related to the work in #1451, correct? fe24a71 should have already given us what we need there. publishStatusAddress should remain: it serves as a user override for Ingress status addresses when they know there network topology doesn't let the controller effectively determine the correct address information (similar to the Service clusterIP setting), as an alternative to --publish-service.
That doesn't have any interaction with the admin API, as we're only concerned with the proxy for Ingress status. We shouldn't need any additional admin API logic--while building configuration from a Service instead is viable, it doesn't really give us anything obvious over requesting a URL instead (since we don't need to look up endpoints or obtain other detailed information about the Service), and our documentation all already refers to the existing --kong-admin-url flag.

unblock controller manager in cluster deployment with appropriate configuration which is illustrated in the yaml and readme file.

I was trying to understand what issue we had there though--if there is some error that appears, we'd want to understand why it occurs to ensure we're adding a fix. I wasn't aware of any, as when I test the controller 2.0.0-alpha.1 image with the current chart and the all-in-one manifests we have in https:/Kong/kubernetes-ingress-controller/tree/main/deploy/single it does deploy successfully. I am aware of issues with the RBAC config and lack of UDPIngress in the existing examples, but I don't think they cause failures (we just can't update finalizers and have to disable UDPIngress), and we planned to update that elsewhere (#1256 and Kong/charts#364). The rest of the manifests that we had for 1.x do still work with 2.x.

The points of the Kong Admin API URL is to remove hardcode ip address (publishStatusAddress) and leverage environment variable retrieving service information. Understand the previous configuration due to controller and proxy are in the same namespace and same pods.

This sounds related to the work in #1451, correct? fe24a71 should have already given us what we need there. publishStatusAddress should remain: it serves as a user override for Ingress status addresses when they know there network topology doesn't let the controller effectively determine the correct address information (similar to the Service clusterIP setting), as an alternative to --publish-service.
That doesn't have any interaction with the admin API, as we're only concerned with the proxy for Ingress status. We shouldn't need any additional admin API logic--while building configuration from a Service instead is viable, it doesn't really give us anything obvious over requesting a URL instead (since we don't need to look up endpoints or obtain other detailed information about the Service), and our documentation all already refers to the existing --kong-admin-url flag.

Agree that both publish-service and PublicStatusAddress should retain as it is, which was covered by #1509 and #1451. It is unrelated to the KIC individual deployment yaml

KIC does need KongAdminAPI for configuration communication,
https:/Kong/kubernetes-ingress-controller/blob/main/railgun/manager/run.go#L96

Current configuration way https:/Kong/kubernetes-ingress-controller/pull/1500/files#L90 assumes all modules are in the same node and having static port (8444), this will break if we deploy controller and proxy separately. The function retrieve Admin ip and port through namespace and name configuration. https:/Kong/kubernetes-ingress-controller/pull/1487/files#diff-786a1651aa544e8c2f84a32b474758ecb0f8e6c5d9d745b5f643d3d786bc57c4R152

@rainest
Copy link
Contributor

rainest commented Jul 12, 2021

tl;dr the permissions failures are because the controller wanted to add finalizers, and couldn't after I removed resource-wide edit permissions. There are subresource permissions that we could add instead to fix this, but we no longer need them following the removal of finalizers in #1522.

@ccfishk are you still seeing deploy failures with latest next? For testing/pending completion of #1256 1.x manifests should work with:

kubectl apply -f https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/next/railgun/config/crd/bases/configuration.konghq.com_udpingresses.yaml
sed -e "s/name: manager-role/name: kong-ingress-clusterrole/g" railgun/config/rbac/role.yaml | kubectl apply -f -

These errors:

time="2021-07-09T01:19:57Z" level=error msg="Reconciler error" error="secrets "default-token-m5gft" is forbidden: User "system:serviceaccount:kong:kong-serviceaccount" cannot update resource "secrets" in API group "" in the namespace "metallb-system"" logger=controller-runtime.manager.controller.secret name=default-token-m5gft namespace=metallb-system reconciler group= reconciler kind=Secret

come from adding finalizers to resources, which is a metadata update. We originally had broader permissions on resources, and I reduced permissions on most resources to get,list,watch (read-only) earlier to try and match 1.x (for lack of any known reason that we now needed edit permissions also): https:/Kong/kubernetes-ingress-controller/blob/2.0.0-alpha.2/railgun/hack/generators/controllers/networking/main.go#L398-L400

We no longer will need those since we've removed finalizers in #1522

There's a bit of (possible) inaccuracy in the kubebuilder examples, where https://book.kubebuilder.io/reference/using-finalizers.html indicates you should have full edit permissions on the resource for finalizers, but it looks like those aren't necessary. You can handle the finalizer RBAC needs with update,create,delete on the <resource>/finalizer subresource instead of the parent resource.

If we were to continue using them, we'd want to add that to the kubebuilder RBAC tags in the railgun/hack/generators/controllers/networking/main.go template. We wouldn't want to modify the deployment manifests directly, as the next run of the manifest generator (make generate followed by make manifests) would wipe that out, and wouldn't want to use the permissions proposed here (those would grant the controller superuser rights; we want it to only have the permissions it actually needs).

@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 12, 2021

tl;dr the permissions failures are because the controller wanted to add finalizers, and couldn't after I removed resource-wide edit permissions. There are subresource permissions that we could add instead to fix this, but we no longer need them following the removal of finalizers in #1522.

@ccfishk are you still seeing deploy failures with latest next? For testing/pending completion of #1256 1.x manifests should work with:

kubectl apply -f https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/next/railgun/config/crd/bases/configuration.konghq.com_udpingresses.yaml
sed -e "s/name: manager-role/name: kong-ingress-clusterrole/g" railgun/config/rbac/role.yaml | kubectl apply -f -

These errors:

time="2021-07-09T01:19:57Z" level=error msg="Reconciler error" error="secrets "default-token-m5gft" is forbidden: User "system:serviceaccount:kong:kong-serviceaccount" cannot update resource "secrets" in API group "" in the namespace "metallb-system"" logger=controller-runtime.manager.controller.secret name=default-token-m5gft namespace=metallb-system reconciler group= reconciler kind=Secret

come from adding finalizers to resources, which is a metadata update. We originally had broader permissions on resources, and I reduced permissions on most resources to get,list,watch (read-only) earlier to try and match 1.x (for lack of any known reason that we now needed edit permissions also): https:/Kong/kubernetes-ingress-controller/blob/2.0.0-alpha.2/railgun/hack/generators/controllers/networking/main.go#L398-L400

[Jimin] AFAIK, finalizer only applied for resource deletion, we apply finalizer for KIC ingress we created. If your operation include all namespaces/any resources, might be associated to this error. Not sure why only got this complain at that moment, would like to take one more look/dig into my case.

We no longer will need those since we've removed finalizers in #1522

There's a bit of (possible) inaccuracy in the kubebuilder examples, where https://book.kubebuilder.io/reference/using-finalizers.html indicates you should have full edit permissions on the resource for finalizers, but it looks like those aren't necessary. You can handle the finalizer RBAC needs with update,create,delete on the <resource>/finalizer subresource instead of the parent resource.

If we were to continue using them, we'd want to add that to the kubebuilder RBAC tags in the railgun/hack/generators/controllers/networking/main.go template. We wouldn't want to modify the deployment manifests directly, as the next run of the manifest generator (make generate followed by make manifests) would wipe that out, and wouldn't want to use the permissions proposed here (those would grant the controller superuser rights; we want it to only have the permissions it actually needs).

[Jimin] aware of the permission scope - it is not Kong's cluster, user's cluster. Agree restrict permission is good. thanks.

https:/Kong/kubernetes-ingress-controller/blob/2.0.0-alpha.2/railgun/hack/generators/controllers/networking/main.go#L398-L400

@rainest
Copy link
Contributor

rainest commented Jul 12, 2021

Not sure why only got this complain at that moment, would like to take one more look/dig into my case.

Some of the errors from controller-gen reconciler code are unfortunately rather opaque--they tell you they're trying to edit an object and failed, but not the specific edit and/or where in the code it logged the error.

Don't actually recall how I realized the finalizers were the culprit last week. I observed that adding

- apiGroups:
  - whatever
  resources:
  - whatever/finalizers
  verbs:
  - update
  - create
  - delete

to roles cleared it, but forget why I thought to do that--I think it was reading through the generated reconciler and seeing which successful logs appeared and checking what client actions it took after. Doing that is rather tedious.

@ccfishk
Copy link
Contributor Author

ccfishk commented Jul 12, 2021

Close this atm according to discussion last Friday, also tests this week.

@ccfishk ccfishk closed this Jul 12, 2021
@shaneutt shaneutt deleted the kic2.0-deployment branch October 27, 2021 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. bug Something isn't working ci/license/unchanged priority/low size/small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants