Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E testing a KIC built from HEAD by CI. #869

Merged
merged 13 commits into from
Nov 2, 2020
Merged

E2E testing a KIC built from HEAD by CI. #869

merged 13 commits into from
Nov 2, 2020

Conversation

mflendrich
Copy link
Contributor

@mflendrich mflendrich commented Sep 22, 2020

fixes #694

  • makes CI build a local KIC image from HEAD
  • if on main or next: pushes to bintray. Can (or, maybe, should) be extended to all pushes
  • spins up a local microk8s cluster
  • pushes the local KIC to the local image registry
  • runs an instance of KIC
  • runs a set of tests: applying a bunch of manifests, waiting (open loop - sleep 6) for KIC to do its work, asserting on curl results
    • There are several possible ways to mitigate the sleep but none of them is perfect:
      • watch the status field of created resources
      • watch KIC logs for a successful/failed sync
      • watch the Kong Admin API /config endpoint

@mflendrich mflendrich changed the title draft CI builds and E2E test draft: E2E testing a KIC built from HEAD by CI. Sep 22, 2020
@mflendrich mflendrich force-pushed the ci/e2e-test branch 5 times, most recently from cfe01dd to ec17735 Compare September 22, 2020 12:55
@hbagdi
Copy link
Member

hbagdi commented Sep 25, 2020

Couple of notes:

  • I think there is value in keeping the build workflow separate form the e2e-test workflow.
  • Did you consider using Make to drive these stages? Having CI and localhost follow the same steps is valuable for e2e testing.

@mflendrich mflendrich force-pushed the ci/e2e-test branch 5 times, most recently from 511a15a to 74a799d Compare October 2, 2020 16:09
@mflendrich mflendrich changed the base branch from main to next October 2, 2020 16:10
@mflendrich mflendrich changed the base branch from next to main October 13, 2020 13:40
@mflendrich
Copy link
Contributor Author

mflendrich commented Oct 13, 2020

@hbagdi

I think there is value in keeping the build workflow separate form the e2e-test workflow.

In GitHub lingo, there are "workflows" and "jobs". My understanding is that you cannot easily define dependencies between workflows (for example: pass an artifact, or define order). You can do that using raw API calls and triggers, but that's definitely far from clean.

Reading this comment as "I think there is value in keeping the building and e2e-testing paths separate, but one waiting for the other, and the artifact being passed between them" - I think (modulo my current understanding of GH Actions capabilities) that using two jobs within a workflow is the "recommended" way to do that in the GitHub of today (which is what this PR is doing).

@mflendrich
Copy link
Contributor Author

Did you consider using Make to drive these stages? Having CI and localhost follow the same steps is valuable for e2e testing.

Yes.

Generally, the test workflow consists of 4 steps:

  • spin up some local k8s cluster with a docker registry
  • push the SUT image into the registry
  • spin up SUT in the local k8s cluster using the pushed image
  • ./run-all-tests.sh

With this PR today, you can run e2e tests on KIC locally by manually reproducing the first 3 steps in your local environment and running ./run-all-tests.sh. And of course we can wrap them in a script or a make recipe, but that requires us to:

  • either cause a lot of side effects in the running user's machine by creating and destroying global k8s clusters, or
  • make our test use a self-contained k8s cluster that can easily live alongside whatever the user has running on their machine, for example within a Docker image (it's possible with microk8s and KIND among others).

The reason why I didn't contain the environment setup instructions in a make recipe (or an equivalent shell script) is because I thought the former (overwriting user's machine) would be inappropriate for a make target, and the latter (running k8s in an ephemeral harness) was too much work for the first iteration.

@mflendrich mflendrich marked this pull request as ready for review October 13, 2020 16:40
@mflendrich mflendrich changed the title draft: E2E testing a KIC built from HEAD by CI. E2E testing a KIC built from HEAD by CI. Oct 13, 2020
Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to go so far as to test behavior, and rather just check that a given set of K8S config results in an expected set of Kong config, as that's effectively the controller's output once it proceeds past the Go objects we currently check in unit tests. My rationale for that is twofold:

  • Writing a validation script per test is fairly expensive, doesn't provide an easy means to diff the actual result against the expected result in the event of a failure, and opens up the possibility for more complex failures other than the test itself not producing the correct output. We run into the last of those somewhat often with the Kong integration tests, which often require manually checking output to see if the failure was an environment issue (e.g. Cassandra didn't properly start because Cassandra reasons, so Cassandra tests failed).
  • We'll duplicate existing work in the Kong integration tests, which implement those behavior checks already (i.e. checks along the line of "if you add this piece of Kong config and then send this request with curl, do you see the expected response?") and have a decent amount of existing coverage there.

We should be able to check config either using a deck dump or GET /config output, and I'm sorta leaning towards the latter because YAML is, for all its faults, still a bit easier to write by hand than JSON. Either option will should have some means of separating out config bits we don't actually care about, since we can't inject specific IDs or created_at times via K8S config.

@rainest
Copy link
Contributor

rainest commented Oct 14, 2020

Comment two is not change requests, and is instead notes on what I did to set up my local test environment, which are useful for other reasons:

Ah! Ubuntu and its suite of things that only work in the Canonical ecosystem! Getting microk8s (and by extension, snap) working in Arch seemed ill-advised, so I repurposed the Ubuntu-based kong Vagrant image for this. Not much reason to use it specifically (a stock Ubunutu Vagrant would have worked fine), but I already had it lying around.

Initial minor roadblock was that, even with an Ubuntu microk8s environment, the test script wants to use kubectl rather than microk8s kubectl, but a quick script edit to run-all-tests.sh and run-one-test.sh sorted that out. Dunno how to make that more solid for future work, but as it only requires a vim replace mode or sed oneliner (sed -i -e "s/kubectl/microk8s kubectl/g" /path/to/run-all-tests.sh), not a huge immediate concern. Alternative is to mimic the GH setup and point kubectl to the microk8s config, but I went with edits because I didn't actually have a kubectl install inside the VM other than the microk8s one.

After that, all the tests still failed; oh no! We can't rely on the normal Github setup tasks, so the tests failed for lack of an actual Kong instance. Relatively simple to fix:

$ microk8s kubectl apply -f ../../deploy/single/all-in-one-dbless.yaml

After that, everything runs successfully.

Docker-based (presumably k3s) setup would be more lightweight than an Ubuntu VM, but I don't often need to run VMs for the reasons I used to (gojira supplanted them), so that's not a major concern for me personally.

@mflendrich
Copy link
Contributor Author

Defined a make integration-test target, too.

@rainest
Copy link
Contributor

rainest commented Oct 27, 2020

Kept running into timeouts when attempting to to run on a clean Debian machine: https://gist.github.com/rainest/2f36ff9fd5741611185e94413bb3d19b

Inexplicably did succeed once, though I'm not sure what distinguished that run from the others--I attempted to disable the cleanup to try and dig into it after, and that run did succeed, but after manually cleaning up (via docker rm -f testcontainer) and attempting a subsequent run, I got the same issues after.

How should we interrogate the kind test environment? Outside the make invocation, I wasn't able to talk to it using either the test script or kind:

# ./util/run-all-tests.sh 
>>> Obtaining Kong proxy IP...
>>> Kong proxy host is '127.0.0.1:27080' for HTTP and '127.0.0.1:27443' for HTTPS.
>>> Setting up example services...
+ kubectl apply -f https://bit.ly/sample-echo-service
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl apply -f https://bit.ly/sample-httpbin-service
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl wait --for=condition=Available deploy echo --timeout=120s
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
+ kubectl wait --for=condition=Available deploy httpbin --timeout=120s
The connection to the server 127.0.0.1:43475 was refused - did you specify the right host or port?
>>> ERROR: Failed to set up example services.
./util/run-all-tests.sh: line 4: kill: (81679) - No such process
/home/rainest/kubernetes-ingress-controller/test/integration# ./kind get kubeconfig
ERROR: could not locate any control plane nodes
 docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
c7a41d3efe57        kindest/node:v1.19.1   "/usr/local/bin/entr…"   31 minutes ago      Up 31 minutes       127.0.0.1:45871->6443/tcp   test-cluster-control-plane
b478797cbf20        registry:2             "/entrypoint.sh /etc…"   31 minutes ago      Up 31 minutes       0.0.0.0:5000->5000/tcp      test-local-registry

@mflendrich
Copy link
Contributor Author

Kept running into timeouts when attempting to to run on a clean Debian machine: https://gist.github.com/rainest/2f36ff9fd5741611185e94413bb3d19b

Inexplicably did succeed once, though I'm not sure what distinguished that run from the others--I attempted to disable the cleanup to try and dig into it after, and that run did succeed, but after manually cleaning up (via docker rm -f testcontainer) and attempting a subsequent run, I got the same issues after.

This was caused by a race between kubectl patch and kubectl port-forward. kubectl port-forward to a service does not bind to a kong-proxy IP; instead, it directly interrogates endpoints and binds to them once and for all. When pods of a service get replaced, port-forward breaks.

Fixed in 8c99d12 - replaced apply and patch with kustomize.

@mflendrich mflendrich force-pushed the ci/e2e-test branch 2 times, most recently from 15e8419 to 12b6701 Compare October 30, 2020 13:00
@mflendrich
Copy link
Contributor Author

How should we interrogate the kind test environment? Outside the make invocation, I wasn't able to talk to it using either the test script or kind:

Added a README.md under /test/integration that aims at answering this question. Let me know if there's some use case that you think would be worth covering too.

Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI component now seems solid. Existing tests run consistently, and I didn't encounter any issues with leftovers or side-effects from previous tests in the course of running through this several times, outside runs where I explicitly disabled teardown (not really a concern for CI runs). Approving as the CI context appears to be the main scope for this PR.

Local runs for test development remain difficult. There are a number of rough edges that will impede that still:

  • Bringing an environment online from scratch isn't particularly quick (around 3 minutes before it starts actually running tests for me). That's probably unavoidable, but keeping an existing environment online has its own challenges, as there's no single step to run one test only: you must set up environment variables, port-forwards, and run scripts all by hand. There's furthermore no built-in means to tear an environment down (running ./kind delete cluster --name test-cluster; docker rm -f $(docker ps -a -q); docker rmi $(docker images -a -q) appears good enough).
  • The admin API isn't exposed outside the Pod or outside KIND--not entirely sure whether there's a way to handle the latter, as port-forwards work around both. We'll probably want to interrogate the admin API after applying configuration to distinguish between issues with test manifests and/or controller translation and issues with the test client command and output validation. We may also want to consider running it on HTTP, as the controller logs aren't always perfectly explanatory, and inspecting the actual admin API calls sent with tcpdump is useful when diagnosing poorly-logged unsuccessful translations.
  • Having now gone through the process of writing a test, I'll echo a comment from Harry in one of the earlier meetings: using a test client and validation with more structured input/output that's aware of protocol details (e.g. Golang's HTTP library) would simplify test writing, insofar as it provides stricter checks on that input and output. With bash+curl validation, I knew what I wanted to write from the outset, but spent a lot of time tracking down issues inherent to those tools, e.g. incorrect variable syntax (missing a $), forgetting argument syntax (I knew which curl flag I wanted to use, but forgot the exact argument format, and curl's own graceful degradation made this difficult to discover--it still made a request, but not the request I wanted), or including extra characters that didn't break the script entirely, but made it test the wrong things (I left an extra parenthese somewhere that appended it to command output, breaking the string comparison the test relied on). Having something like Go's compile/vet checks avoids that class of mistake, or at least separates reports of those mistakes from the actual test run.
  • During normal test runs, output is gobbled up by the validation check. Although I'd normally diagnose issues with incorrectly-written validation by reviewing curl's output, the test relies on printing the status code only, and I have to break out of the normal test flow and run components manually to get that. Trying to then work backwards to fix the test isn't perfect either, since the path to set up the test environment is different, and those differences may affect component behavior.
  • The environment data available to the test environment isn't yet documented, and it's populated via mechanisms that don't exist during manual setup. Ultimately I realized that while I needed a separate hostname and port variable for my test, and could create/pre-populate them during a manual run, I didn't have access to the same data during a full run.

@mflendrich
Copy link
Contributor Author

mflendrich commented Nov 2, 2020

@rainest thanks for the thorough review and for great comments!

Following your last comment, created #937 #938 #939 #940
Also created #941.

The environment data available to the test environment isn't yet documented, and it's populated via mechanisms that don't exist during manual setup. Ultimately I realized that while I needed a separate hostname and port variable for my test, and could create/pre-populate them during a manual run, I didn't have access to the same data during a full run.

I don't understand what you'd like to achieve here. Could you please create an issue so that we don't miss this?

rainest pushed a commit that referenced this pull request Dec 2, 2020
fixes #694 

- makes CI build a local KIC image from HEAD
- if on `main` or `next`: pushes to bintray. Can (or, maybe, should) be extended to all pushes
- spins up a local microk8s cluster
- pushes the local KIC to the local image registry
- runs an instance of KIC
- runs a set of tests: applying a bunch of manifests, waiting (open loop - `sleep 6`) for KIC to do its work, asserting on `curl` results
    - There are several possible ways to mitigate the `sleep` but none of them is perfect:
        - watch the `status` field of created resources
        - watch KIC logs for a successful/failed sync
        - watch the Kong Admin API `/config` endpoint

* test(integration): implement harness and basic tests

* test(integration): run on ci

* chore(test): improve test cleanup

* chore(test): switch from microk8s to kind

* chore(makefile): define target `integration-test`

* test(integration): patch instead of sed, disable anonymous reports

* test(e2e): replace kubectl patch with kustomization

* test(e2e): bump kubectl wait timeout to account for slow pulls

* test(e2e): remove unused PROXY_IP variable

* test(e2e): write README

* test(e2e): add shebang to verify.sh files

* test(e2e): gitignore leftover stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup e2e tests
3 participants