-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E testing a KIC built from HEAD by CI. #869
Conversation
cfe01dd
to
ec17735
Compare
Couple of notes:
|
511a15a
to
74a799d
Compare
74a799d
to
e9c7a83
Compare
e9c7a83
to
c9404e4
Compare
c9404e4
to
4a37bc1
Compare
In GitHub lingo, there are "workflows" and "jobs". My understanding is that you cannot easily define dependencies between workflows (for example: pass an artifact, or define order). You can do that using raw API calls and triggers, but that's definitely far from clean. Reading this comment as "I think there is value in keeping the building and e2e-testing paths separate, but one waiting for the other, and the artifact being passed between them" - I think (modulo my current understanding of GH Actions capabilities) that using two jobs within a workflow is the "recommended" way to do that in the GitHub of today (which is what this PR is doing). |
Yes. Generally, the test workflow consists of 4 steps:
With this PR today, you can run e2e tests on KIC locally by manually reproducing the first 3 steps in your local environment and running
The reason why I didn't contain the environment setup instructions in a make recipe (or an equivalent shell script) is because I thought the former (overwriting user's machine) would be inappropriate for a make target, and the latter (running k8s in an ephemeral harness) was too much work for the first iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to go so far as to test behavior, and rather just check that a given set of K8S config results in an expected set of Kong config, as that's effectively the controller's output once it proceeds past the Go objects we currently check in unit tests. My rationale for that is twofold:
- Writing a validation script per test is fairly expensive, doesn't provide an easy means to diff the actual result against the expected result in the event of a failure, and opens up the possibility for more complex failures other than the test itself not producing the correct output. We run into the last of those somewhat often with the Kong integration tests, which often require manually checking output to see if the failure was an environment issue (e.g. Cassandra didn't properly start because Cassandra reasons, so Cassandra tests failed).
- We'll duplicate existing work in the Kong integration tests, which implement those behavior checks already (i.e. checks along the line of "if you add this piece of Kong config and then send this request with curl, do you see the expected response?") and have a decent amount of existing coverage there.
We should be able to check config either using a deck dump or GET /config
output, and I'm sorta leaning towards the latter because YAML is, for all its faults, still a bit easier to write by hand than JSON. Either option will should have some means of separating out config bits we don't actually care about, since we can't inject specific IDs or created_at times via K8S config.
Comment two is not change requests, and is instead notes on what I did to set up my local test environment, which are useful for other reasons: Ah! Ubuntu and its suite of things that only work in the Canonical ecosystem! Getting microk8s (and by extension, snap) working in Arch seemed ill-advised, so I repurposed the Ubuntu-based kong Vagrant image for this. Not much reason to use it specifically (a stock Ubunutu Vagrant would have worked fine), but I already had it lying around. Initial minor roadblock was that, even with an Ubuntu microk8s environment, the test script wants to use After that, all the tests still failed; oh no! We can't rely on the normal Github setup tasks, so the tests failed for lack of an actual Kong instance. Relatively simple to fix:
After that, everything runs successfully. Docker-based (presumably k3s) setup would be more lightweight than an Ubuntu VM, but I don't often need to run VMs for the reasons I used to (gojira supplanted them), so that's not a major concern for me personally. |
7a0473f
to
808932e
Compare
Defined a |
cbf2225
to
c2a8c1b
Compare
Kept running into timeouts when attempting to to run on a clean Debian machine: https://gist.github.com/rainest/2f36ff9fd5741611185e94413bb3d19b Inexplicably did succeed once, though I'm not sure what distinguished that run from the others--I attempted to disable the cleanup to try and dig into it after, and that run did succeed, but after manually cleaning up (via How should we interrogate the kind test environment? Outside the make invocation, I wasn't able to talk to it using either the test script or kind:
|
This was caused by a race between Fixed in 8c99d12 - replaced |
15e8419
to
12b6701
Compare
12b6701
to
d26fa1d
Compare
d26fa1d
to
cb55d44
Compare
Added a README.md under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI component now seems solid. Existing tests run consistently, and I didn't encounter any issues with leftovers or side-effects from previous tests in the course of running through this several times, outside runs where I explicitly disabled teardown (not really a concern for CI runs). Approving as the CI context appears to be the main scope for this PR.
Local runs for test development remain difficult. There are a number of rough edges that will impede that still:
- Bringing an environment online from scratch isn't particularly quick (around 3 minutes before it starts actually running tests for me). That's probably unavoidable, but keeping an existing environment online has its own challenges, as there's no single step to run one test only: you must set up environment variables, port-forwards, and run scripts all by hand. There's furthermore no built-in means to tear an environment down (running
./kind delete cluster --name test-cluster; docker rm -f $(docker ps -a -q); docker rmi $(docker images -a -q)
appears good enough). - The admin API isn't exposed outside the Pod or outside KIND--not entirely sure whether there's a way to handle the latter, as port-forwards work around both. We'll probably want to interrogate the admin API after applying configuration to distinguish between issues with test manifests and/or controller translation and issues with the test client command and output validation. We may also want to consider running it on HTTP, as the controller logs aren't always perfectly explanatory, and inspecting the actual admin API calls sent with tcpdump is useful when diagnosing poorly-logged unsuccessful translations.
- Having now gone through the process of writing a test, I'll echo a comment from Harry in one of the earlier meetings: using a test client and validation with more structured input/output that's aware of protocol details (e.g. Golang's HTTP library) would simplify test writing, insofar as it provides stricter checks on that input and output. With bash+curl validation, I knew what I wanted to write from the outset, but spent a lot of time tracking down issues inherent to those tools, e.g. incorrect variable syntax (missing a
$
), forgetting argument syntax (I knew which curl flag I wanted to use, but forgot the exact argument format, and curl's own graceful degradation made this difficult to discover--it still made a request, but not the request I wanted), or including extra characters that didn't break the script entirely, but made it test the wrong things (I left an extra parenthese somewhere that appended it to command output, breaking the string comparison the test relied on). Having something like Go's compile/vet checks avoids that class of mistake, or at least separates reports of those mistakes from the actual test run. - During normal test runs, output is gobbled up by the validation check. Although I'd normally diagnose issues with incorrectly-written validation by reviewing curl's output, the test relies on printing the status code only, and I have to break out of the normal test flow and run components manually to get that. Trying to then work backwards to fix the test isn't perfect either, since the path to set up the test environment is different, and those differences may affect component behavior.
- The environment data available to the test environment isn't yet documented, and it's populated via mechanisms that don't exist during manual setup. Ultimately I realized that while I needed a separate hostname and port variable for my test, and could create/pre-populate them during a manual run, I didn't have access to the same data during a full run.
@rainest thanks for the thorough review and for great comments! Following your last comment, created #937 #938 #939 #940
I don't understand what you'd like to achieve here. Could you please create an issue so that we don't miss this? |
fixes #694 - makes CI build a local KIC image from HEAD - if on `main` or `next`: pushes to bintray. Can (or, maybe, should) be extended to all pushes - spins up a local microk8s cluster - pushes the local KIC to the local image registry - runs an instance of KIC - runs a set of tests: applying a bunch of manifests, waiting (open loop - `sleep 6`) for KIC to do its work, asserting on `curl` results - There are several possible ways to mitigate the `sleep` but none of them is perfect: - watch the `status` field of created resources - watch KIC logs for a successful/failed sync - watch the Kong Admin API `/config` endpoint * test(integration): implement harness and basic tests * test(integration): run on ci * chore(test): improve test cleanup * chore(test): switch from microk8s to kind * chore(makefile): define target `integration-test` * test(integration): patch instead of sed, disable anonymous reports * test(e2e): replace kubectl patch with kustomization * test(e2e): bump kubectl wait timeout to account for slow pulls * test(e2e): remove unused PROXY_IP variable * test(e2e): write README * test(e2e): add shebang to verify.sh files * test(e2e): gitignore leftover stuff
fixes #694
main
ornext
: pushes to bintray. Can (or, maybe, should) be extended to all pushessleep 6
) for KIC to do its work, asserting oncurl
resultssleep
but none of them is perfect:status
field of created resources/config
endpoint