E2E testing a KIC built from HEAD by CI. #869

mflendrich · 2020-09-22T11:54:43Z

fixes #694

makes CI build a local KIC image from HEAD
if on main or next: pushes to bintray. Can (or, maybe, should) be extended to all pushes
spins up a local microk8s cluster
pushes the local KIC to the local image registry
runs an instance of KIC
runs a set of tests: applying a bunch of manifests, waiting (open loop - sleep 6) for KIC to do its work, asserting on curl results
- There are several possible ways to mitigate the sleep but none of them is perfect:
  - watch the status field of created resources
  - watch KIC logs for a successful/failed sync
  - watch the Kong Admin API /config endpoint

hbagdi · 2020-09-25T18:09:30Z

Couple of notes:

I think there is value in keeping the build workflow separate form the e2e-test workflow.
Did you consider using Make to drive these stages? Having CI and localhost follow the same steps is valuable for e2e testing.

mflendrich · 2020-10-13T14:01:58Z

@hbagdi

I think there is value in keeping the build workflow separate form the e2e-test workflow.

In GitHub lingo, there are "workflows" and "jobs". My understanding is that you cannot easily define dependencies between workflows (for example: pass an artifact, or define order). You can do that using raw API calls and triggers, but that's definitely far from clean.

Reading this comment as "I think there is value in keeping the building and e2e-testing paths separate, but one waiting for the other, and the artifact being passed between them" - I think (modulo my current understanding of GH Actions capabilities) that using two jobs within a workflow is the "recommended" way to do that in the GitHub of today (which is what this PR is doing).

mflendrich · 2020-10-13T14:10:22Z

Did you consider using Make to drive these stages? Having CI and localhost follow the same steps is valuable for e2e testing.

Yes.

Generally, the test workflow consists of 4 steps:

spin up some local k8s cluster with a docker registry
push the SUT image into the registry
spin up SUT in the local k8s cluster using the pushed image
./run-all-tests.sh

With this PR today, you can run e2e tests on KIC locally by manually reproducing the first 3 steps in your local environment and running ./run-all-tests.sh. And of course we can wrap them in a script or a make recipe, but that requires us to:

either cause a lot of side effects in the running user's machine by creating and destroying global k8s clusters, or
make our test use a self-contained k8s cluster that can easily live alongside whatever the user has running on their machine, for example within a Docker image (it's possible with microk8s and KIND among others).

The reason why I didn't contain the environment setup instructions in a make recipe (or an equivalent shell script) is because I thought the former (overwriting user's machine) would be inappropriate for a make target, and the latter (running k8s in an ephemeral harness) was too much work for the first iteration.

rainest

We probably don't want to go so far as to test behavior, and rather just check that a given set of K8S config results in an expected set of Kong config, as that's effectively the controller's output once it proceeds past the Go objects we currently check in unit tests. My rationale for that is twofold:

Writing a validation script per test is fairly expensive, doesn't provide an easy means to diff the actual result against the expected result in the event of a failure, and opens up the possibility for more complex failures other than the test itself not producing the correct output. We run into the last of those somewhat often with the Kong integration tests, which often require manually checking output to see if the failure was an environment issue (e.g. Cassandra didn't properly start because Cassandra reasons, so Cassandra tests failed).
We'll duplicate existing work in the Kong integration tests, which implement those behavior checks already (i.e. checks along the line of "if you add this piece of Kong config and then send this request with curl, do you see the expected response?") and have a decent amount of existing coverage there.

We should be able to check config either using a deck dump or GET /config output, and I'm sorta leaning towards the latter because YAML is, for all its faults, still a bit easier to write by hand than JSON. Either option will should have some means of separating out config bits we don't actually care about, since we can't inject specific IDs or created_at times via K8S config.

rainest · 2020-10-14T02:45:04Z

Comment two is not change requests, and is instead notes on what I did to set up my local test environment, which are useful for other reasons:

Ah! Ubuntu and its suite of things that only work in the Canonical ecosystem! Getting microk8s (and by extension, snap) working in Arch seemed ill-advised, so I repurposed the Ubuntu-based kong Vagrant image for this. Not much reason to use it specifically (a stock Ubunutu Vagrant would have worked fine), but I already had it lying around.

Initial minor roadblock was that, even with an Ubuntu microk8s environment, the test script wants to use kubectl rather than microk8s kubectl, but a quick script edit to run-all-tests.sh and run-one-test.sh sorted that out. Dunno how to make that more solid for future work, but as it only requires a vim replace mode or sed oneliner (sed -i -e "s/kubectl/microk8s kubectl/g" /path/to/run-all-tests.sh), not a huge immediate concern. Alternative is to mimic the GH setup and point kubectl to the microk8s config, but I went with edits because I didn't actually have a kubectl install inside the VM other than the microk8s one.

After that, all the tests still failed; oh no! We can't rely on the normal Github setup tasks, so the tests failed for lack of an actual Kong instance. Relatively simple to fix:

$ microk8s kubectl apply -f ../../deploy/single/all-in-one-dbless.yaml

After that, everything runs successfully.

Docker-based (presumably k3s) setup would be more lightweight than an Ubuntu VM, but I don't often need to run VMs for the reasons I used to (gojira supplanted them), so that's not a major concern for me personally.

mflendrich · 2020-10-22T11:38:52Z

Defined a make integration-test target, too.

rainest · 2020-10-27T00:31:17Z

Kept running into timeouts when attempting to to run on a clean Debian machine: https://gist.github.com/rainest/2f36ff9fd5741611185e94413bb3d19b

Inexplicably did succeed once, though I'm not sure what distinguished that run from the others--I attempted to disable the cleanup to try and dig into it after, and that run did succeed, but after manually cleaning up (via docker rm -f testcontainer) and attempting a subsequent run, I got the same issues after.

How should we interrogate the kind test environment? Outside the make invocation, I wasn't able to talk to it using either the test script or kind:

# ./util/run-all-tests.sh 
>>> Obtaining Kong proxy IP...
>>> Kong proxy host is '127.0.0.1:27080' for HTTP and '127.0.0.1:27443' for HTTPS.
>>> Setting up example services...
+ kubectl apply -f https://bit.ly/sample-echo-service
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl apply -f https://bit.ly/sample-httpbin-service
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl wait --for=condition=Available deploy echo --timeout=120s
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl wait --for=condition=Available deploy httpbin --timeout=120s
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
>>> ERROR: Failed to set up example services.
./util/run-all-tests.sh: line 4: kill: (81679) - No such process

/home/rainest/kubernetes-ingress-controller/test/integration# ./kind get kubeconfig
ERROR: could not locate any control plane nodes

 docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
c7a41d3efe57        kindest/node:v1.19.1   "/usr/local/bin/entr…"   31 minutes ago      Up 31 minutes       127.0.0.1:45871->6443/tcp   test-cluster-control-plane
b478797cbf20        registry:2             "/entrypoint.sh /etc…"   31 minutes ago      Up 31 minutes       0.0.0.0:5000->5000/tcp      test-local-registry

mflendrich · 2020-10-29T14:53:11Z

Kept running into timeouts when attempting to to run on a clean Debian machine: https://gist.github.com/rainest/2f36ff9fd5741611185e94413bb3d19b

Inexplicably did succeed once, though I'm not sure what distinguished that run from the others--I attempted to disable the cleanup to try and dig into it after, and that run did succeed, but after manually cleaning up (via docker rm -f testcontainer) and attempting a subsequent run, I got the same issues after.

This was caused by a race between kubectl patch and kubectl port-forward. kubectl port-forward to a service does not bind to a kong-proxy IP; instead, it directly interrogates endpoints and binds to them once and for all. When pods of a service get replaced, port-forward breaks.

Fixed in 8c99d12 - replaced apply and patch with kustomize.

mflendrich · 2020-10-30T13:37:51Z

How should we interrogate the kind test environment? Outside the make invocation, I wasn't able to talk to it using either the test script or kind:

Added a README.md under /test/integration that aims at answering this question. Let me know if there's some use case that you think would be worth covering too.

rainest

CI component now seems solid. Existing tests run consistently, and I didn't encounter any issues with leftovers or side-effects from previous tests in the course of running through this several times, outside runs where I explicitly disabled teardown (not really a concern for CI runs). Approving as the CI context appears to be the main scope for this PR.

Local runs for test development remain difficult. There are a number of rough edges that will impede that still:

Bringing an environment online from scratch isn't particularly quick (around 3 minutes before it starts actually running tests for me). That's probably unavoidable, but keeping an existing environment online has its own challenges, as there's no single step to run one test only: you must set up environment variables, port-forwards, and run scripts all by hand. There's furthermore no built-in means to tear an environment down (running ./kind delete cluster --name test-cluster; docker rm -f $(docker ps -a -q); docker rmi $(docker images -a -q) appears good enough).
The admin API isn't exposed outside the Pod or outside KIND--not entirely sure whether there's a way to handle the latter, as port-forwards work around both. We'll probably want to interrogate the admin API after applying configuration to distinguish between issues with test manifests and/or controller translation and issues with the test client command and output validation. We may also want to consider running it on HTTP, as the controller logs aren't always perfectly explanatory, and inspecting the actual admin API calls sent with tcpdump is useful when diagnosing poorly-logged unsuccessful translations.
Having now gone through the process of writing a test, I'll echo a comment from Harry in one of the earlier meetings: using a test client and validation with more structured input/output that's aware of protocol details (e.g. Golang's HTTP library) would simplify test writing, insofar as it provides stricter checks on that input and output. With bash+curl validation, I knew what I wanted to write from the outset, but spent a lot of time tracking down issues inherent to those tools, e.g. incorrect variable syntax (missing a $), forgetting argument syntax (I knew which curl flag I wanted to use, but forgot the exact argument format, and curl's own graceful degradation made this difficult to discover--it still made a request, but not the request I wanted), or including extra characters that didn't break the script entirely, but made it test the wrong things (I left an extra parenthese somewhere that appended it to command output, breaking the string comparison the test relied on). Having something like Go's compile/vet checks avoids that class of mistake, or at least separates reports of those mistakes from the actual test run.
During normal test runs, output is gobbled up by the validation check. Although I'd normally diagnose issues with incorrectly-written validation by reviewing curl's output, the test relies on printing the status code only, and I have to break out of the normal test flow and run components manually to get that. Trying to then work backwards to fix the test isn't perfect either, since the path to set up the test environment is different, and those differences may affect component behavior.
The environment data available to the test environment isn't yet documented, and it's populated via mechanisms that don't exist during manual setup. Ultimately I realized that while I needed a separate hostname and port variable for my test, and could create/pre-populate them during a manual run, I didn't have access to the same data during a full run.

mflendrich · 2020-11-02T20:41:23Z

@rainest thanks for the thorough review and for great comments!

Following your last comment, created #937 #938 #939 #940
Also created #941.

The environment data available to the test environment isn't yet documented, and it's populated via mechanisms that don't exist during manual setup. Ultimately I realized that while I needed a separate hostname and port variable for my test, and could create/pre-populate them during a manual run, I didn't have access to the same data during a full run.

I don't understand what you'd like to achieve here. Could you please create an issue so that we don't miss this?

fixes #694 - makes CI build a local KIC image from HEAD - if on `main` or `next`: pushes to bintray. Can (or, maybe, should) be extended to all pushes - spins up a local microk8s cluster - pushes the local KIC to the local image registry - runs an instance of KIC - runs a set of tests: applying a bunch of manifests, waiting (open loop - `sleep 6`) for KIC to do its work, asserting on `curl` results - There are several possible ways to mitigate the `sleep` but none of them is perfect: - watch the `status` field of created resources - watch KIC logs for a successful/failed sync - watch the Kong Admin API `/config` endpoint * test(integration): implement harness and basic tests * test(integration): run on ci * chore(test): improve test cleanup * chore(test): switch from microk8s to kind * chore(makefile): define target `integration-test` * test(integration): patch instead of sed, disable anonymous reports * test(e2e): replace kubectl patch with kustomization * test(e2e): bump kubectl wait timeout to account for slow pulls * test(e2e): remove unused PROXY_IP variable * test(e2e): write README * test(e2e): add shebang to verify.sh files * test(e2e): gitignore leftover stuff

mflendrich changed the title ~~draft CI builds and E2E test~~ draft: E2E testing a KIC built from HEAD by CI. Sep 22, 2020

mflendrich force-pushed the ci/e2e-test branch 5 times, most recently from cfe01dd to ec17735 Compare September 22, 2020 12:55

mflendrich force-pushed the ci/e2e-test branch 5 times, most recently from 511a15a to 74a799d Compare October 2, 2020 16:09

mflendrich changed the base branch from main to next October 2, 2020 16:10

mflendrich force-pushed the ci/e2e-test branch from 74a799d to e9c7a83 Compare October 2, 2020 18:58

test(integration): implement harness and basic tests

914f704

mflendrich force-pushed the ci/e2e-test branch from e9c7a83 to c9404e4 Compare October 13, 2020 13:39

mflendrich changed the base branch from next to main October 13, 2020 13:40

mflendrich added 2 commits October 13, 2020 15:50

test(integration): run on ci

f096104

chore(test): improve test cleanup

4a37bc1

mflendrich force-pushed the ci/e2e-test branch from c9404e4 to 4a37bc1 Compare October 13, 2020 13:53

mflendrich marked this pull request as ready for review October 13, 2020 16:40

mflendrich requested review from hbagdi and rainest as code owners October 13, 2020 16:40

mflendrich changed the title ~~draft: E2E testing a KIC built from HEAD by CI.~~ E2E testing a KIC built from HEAD by CI. Oct 13, 2020

rainest suggested changes Oct 14, 2020

View reviewed changes

chore(test): switch from microk8s to kind

808932e

mflendrich force-pushed the ci/e2e-test branch from 7a0473f to 808932e Compare October 14, 2020 17:13

chore(makefile): define target integration-test

e3b41c4

test(integration): patch instead of sed, disable anonymous reports

c2a8c1b

mflendrich force-pushed the ci/e2e-test branch from cbf2225 to c2a8c1b Compare October 26, 2020 17:31

rainest mentioned this pull request Oct 29, 2020

Infer appropriate Kong route snis value from Ingress contents automatically #929

Closed

2 tasks

test(e2e): replace kubectl patch with kustomization

8c99d12

mflendrich force-pushed the ci/e2e-test branch 2 times, most recently from 15e8419 to 12b6701 Compare October 30, 2020 13:00

mflendrich added 2 commits October 30, 2020 14:06

test(e2e): bump kubectl wait timeout to account for slow pulls

63f2305

test(e2e): remove unused PROXY_IP variable

955cbf2

mflendrich force-pushed the ci/e2e-test branch from 12b6701 to d26fa1d Compare October 30, 2020 13:07

mflendrich added 3 commits October 30, 2020 14:33

test(e2e): write README

cacdc9f

test(e2e): add shebang to verify.sh files

f30eb48

test(e2e): gitignore leftover stuff

cb55d44

mflendrich force-pushed the ci/e2e-test branch from d26fa1d to cb55d44 Compare October 30, 2020 13:33

mflendrich mentioned this pull request Oct 30, 2020

chore(manifests): bump kong to 2.2 #932

Merged

rainest approved these changes Nov 2, 2020

View reviewed changes

rainest mentioned this pull request Nov 2, 2020

Add SNI match criteria to routes #863

Merged

Merge branch 'main' into ci/e2e-test

303b036

mflendrich merged commit 86bba5b into main Nov 2, 2020

mflendrich deleted the ci/e2e-test branch November 2, 2020 20:25

This was referenced Nov 2, 2020

E2E test: easy way to bring the testenv up and down #937

Closed

E2E test: expose the Admin API for debugging #938

Closed

E2E test: rewrite test cases in Go #939

Closed

E2E test: expose test request details #940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E testing a KIC built from HEAD by CI. #869

E2E testing a KIC built from HEAD by CI. #869

mflendrich commented Sep 22, 2020 •

edited

Loading

hbagdi commented Sep 25, 2020

mflendrich commented Oct 13, 2020 •

edited

Loading

mflendrich commented Oct 13, 2020

rainest left a comment

rainest commented Oct 14, 2020 •

edited

Loading

mflendrich commented Oct 22, 2020

rainest commented Oct 27, 2020

mflendrich commented Oct 29, 2020

mflendrich commented Oct 30, 2020

rainest left a comment

mflendrich commented Nov 2, 2020 •

edited

Loading

E2E testing a KIC built from HEAD by CI. #869

E2E testing a KIC built from HEAD by CI. #869

Conversation

mflendrich commented Sep 22, 2020 • edited Loading

hbagdi commented Sep 25, 2020

mflendrich commented Oct 13, 2020 • edited Loading

mflendrich commented Oct 13, 2020

rainest left a comment

Choose a reason for hiding this comment

rainest commented Oct 14, 2020 • edited Loading

mflendrich commented Oct 22, 2020

rainest commented Oct 27, 2020

mflendrich commented Oct 29, 2020

mflendrich commented Oct 30, 2020

rainest left a comment

Choose a reason for hiding this comment

mflendrich commented Nov 2, 2020 • edited Loading

mflendrich commented Sep 22, 2020 •

edited

Loading

mflendrich commented Oct 13, 2020 •

edited

Loading

rainest commented Oct 14, 2020 •

edited

Loading

mflendrich commented Nov 2, 2020 •

edited

Loading