Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report config push failures as Events #3446

Merged
merged 50 commits into from
Feb 15, 2023
Merged

Report config push failures as Events #3446

merged 50 commits into from
Feb 15, 2023

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Jan 28, 2023

What this PR does / why we need it:

Adds tag information indicating the Kubernetes resource that resulted in a Kong resource.

Adds parent information to Services, to indicate which route initially added them. Mostly for multi-service backends, which we cannot tag with a single Kubernetes Service and need to tag with whatever GWAPI route added them.

Handles and uses the Kong 3.2 flattened error response for DB-less pushes.

Generates Events associated with problem resources when a DB-less push fails.

Which issue this PR fixes:

Best-effort solution for FTI-4350. Although we cannot reject such failures up front, we can make their cause more accessible than the status quo (controller logs mention the Kong entity at fault, but not the Kubernetes resources). Users can query events by type and see the Kubernetes resource in the Event data.

Special notes for your reviewer:

WIP commits left as-is. This will eventually be squashed, but the history is too long and complicated to rebase into a more cohesive set of commits. The individual commits may be somewhat more useful to understand history than squashing up front. Apologies for the size, the original mandate was to produce a working PoC up front, so the issue was not broken into smaller chunks.

The addition of Kubernetes resource tags broke a large number of unit tests that want to compare complete objects. The tag contents are irrelevant to those tests, but break Equal(). My updates to unit tests either null out the tags before comparison or add the tags to the expected objects, depending on which was easier for the individual test. This (and similar broad changes in the past) requiring so many changes suggests we should amend these tests to instead compare relevant fields instead of complete objects, but reviewing which fields are relevant and refactoring all tests is a large effort. Depending on the unit test, not all would add useful tags anyway (since they're not starting from a complete Kubernetes resource).

Integration tests require running against kong/kong:master-alpine to check this functionality, as 3.2 is unreleased.
event_generation_manual.txt is a local run. rainest#189 uses a modified CI job to run this elsewhere (the nightly test job was already a bit broken and, I expect, now much more broken due to lack of updates to keep up with the main CI test job).

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

Add tags to generated resources indicating their source Kubernetes
resource. (WIP: Ingress v1 only for now)

When OnUpdateInMemoryMode receives a configuration rejected error from
the /config endpoint, parse the new tagged flat error array into a list
of errors associated with their source Kubernetes resource.

Generate a Kubernetes Event for all configuration rejection errors
associated with the source Kubernetes resource.
Add a utility function to generate Kubernetes resource metadata Kong
tags from a client.Object.

Use the utility to add tags to resources generated by the parser.
rainest and others added 7 commits February 3, 2023 14:55
Track the parent xRoute/Ingress/etc. when populating services.

For cases where there is no proper Kubernetes Service to indicate as the
cause of a problem (multi-Service backends and default backends), add
tags indicating the parent route as the source of the Kong resource.

Handle a newly-discovered edge case in service generation that would
break service tag addition with a TODO. The bug predates this change,
but was apparently benign until this change.
@rainest rainest requested a review from a team as a code owner February 10, 2023 00:58
internal/dataplane/parser/ingressrules.go Outdated Show resolved Hide resolved
internal/dataplane/parser/ingressrules.go Outdated Show resolved Hide resolved
internal/dataplane/sendconfig/error_handling.go Outdated Show resolved Hide resolved
internal/dataplane/sendconfig/error_handling.go Outdated Show resolved Hide resolved
test/integration/config_error_event_test.go Outdated Show resolved Hide resolved
internal/dataplane/sendconfig/sendconfig.go Outdated Show resolved Hide resolved
@rainest
Copy link
Contributor Author

rainest commented Feb 11, 2023

https:/rainest/kubernetes-ingress-controller/actions/runs/4148511218/jobs/7176623721 from fork run rainest#193 is this against current Kong master.

I was trying to think of a way to make the master image runs conditional (we can either include or exclude them via labels on PRs), but between limitations in Actions and our current matrix system I'm not sure we can. My attempts do run it against latest master, but they always do so. Skipping that without the appropriate label didn't work as I'd hoped, and even if it did I think the matrix means it'd do an extra test run, since it should skip the image override step (it's not even doing this), but that would then result in it still running the matrix entry with the standard image. Ideally it'd not run the extra job at all if it's unnecessary.

Copy link
Contributor

@randmonkey randmonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, but I want to know that what happens if POST /config error happened in the following situations:

  • Kong version <= 3.1
  • failed to connect to kong admin API

internal/util/k8s.go Outdated Show resolved Hide resolved
internal/dataplane/sendconfig/sendconfig.go Outdated Show resolved Hide resolved
test/integration/config_error_event_test.go Show resolved Hide resolved
test/integration/config_error_event_test.go Show resolved Hide resolved
internal/dataplane/parser/parser_test.go Show resolved Hide resolved
@rainest
Copy link
Contributor Author

rainest commented Feb 14, 2023

@randmonkey

LGTM in general, but I want to know that what happens if POST /config error happened in the following situations:
Kong version <= 3.1
failed to connect to kong admin API

Nothing new. This unmarshals errors from POST /config into a struct that has an omit_empty field for the new flattened_errors field, so if that's not present (either because older Kong versions don't have it or because it's some other class of error), the struct field is empty the code that checks if there are ResourceErrors to process sees none and skips that processing. We then proceed with the error handling we already had (logging a failure and returning an error back up through the sync loop).

Copy link
Contributor

@czeslavo czeslavo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@pmalek pmalek added this to the KIC v2.9.0 milestone Feb 15, 2023
@pmalek pmalek added the area/feature New feature or request label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/feature New feature or request ci/run-nightly Run nightly Kong image integration tests size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants