Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move QPY tests to GitHub Actions and increase inter-symengine tests #13273

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jakelishman
Copy link
Member

Summary

This commit has two major goals:

  • fix the caching of the QPY files for both the main and stable/* branches

  • increase the number of compatibility tests between the different symengine versions that might be involved in the generation and loading of the QPY files.

Achieving both of these goals also means that it is sensible to move the job to GitHub Actions at the same time, since it will put more pressure on the Azure machine concurrency we use.

Caching

The previous QPY tests attempted to cache the generated files for each historical version of Qiskit, but this was unreliable. The cache never seemed to hit on backport branches, which was a huge slowdown in the critical path to getting releases out. The cache restore keys were also a bit lax, meaning that we might accidentally have invalidated files in the cache by changing what we wanted to test, but the restore keys wouldn't have changed.

The cache files would fail to restore as a side-effect of ed79d42 (gh-11526); QPY was moved to be on the tail end of the lint run, rather than in a test run. This meant that it was no longer run as part of the push event when updating main or one of the stable/* branches. In Azure (and GitHub Actions), the "cache" action accesses a scoped cache, not a universal one for the repository 12. Approximately, base branches each have their own scope, and PR events open a new scope that is a child of the target branch, the default branch, and the source branch, if appropriate. A cache task can read from any of its parent scopes, but write events go to the most local scope. This means that we haven't been writing to long-standing caches for some time now. PRs would typically miss the cache on the first attempt, hit their cache for updates, then miss again once entering the merge queue.

The fix for this is to run the QPY job on branch-update events as well. The post-job cache action will then write out to a reachable cache for all following events.

Cross-symengine tests

We previously were just running a single test with differing versions of symengine between the loading and generation of the QPY files. This refactors the QPY run_tests.sh script to run a full pairwise matrix of compatibility tests, to increase the coverage.

Details and comments

This is a CI change, so the chance I got it right first time is approximately zero. Just need a PR to start testing it. We'll need to update the branch-protection rules if we decide to merge this.

Footnotes

  1. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache

  2. https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security

This commit has two major goals:

- fix the caching of the QPY files for both the `main` and `stable/*`
  branches

- increase the number of compatibility tests between the different
  symengine versions that might be involved in the generation and
  loading of the QPY files.

Achieving both of these goals also means that it is sensible to move the
job to GitHub Actions at the same time, since it will put more pressure
on the Azure machine concurrency we use.

Caching
-------

The previous QPY tests attempted to cache the generated files for each
historical version of Qiskit, but this was unreliable.  The cache never
seemed to hit on backport branches, which was a huge slowdown in the
critical path to getting releases out.  The cache restore keys were also
a bit lax, meaning that we might accidentally have invalidated files in
the cache by changing what we wanted to test, but the restore keys
wouldn't have changed.

The cache files would fail to restore as a side-effect of ed79d42
(Qiskitgh-11526); QPY was moved to be on the tail end of the lint run, rather
than in a test run.  This meant that it was no longer run as part of the
push event when updating `main` or one of the `stable/*` branches.  In
Azure (and GitHub Actions), the "cache" action accesses a _scoped_
cache, not a universal one for the repository [^1][^2].  Approximately,
base branches each have their own scope, and PR events open a new scope
that is a child of the target branch, the default branch, and the source
branch, if appropriate.  A cache task can read from any of its parent
scopes, but write events go to the most local scope.  This means that we
haven't been writing to long-standing caches for some time now.  PRs
would typically miss the cache on the first attempt, hit their
cache for updates, then miss again once entering the merge queue.

The fix for this is to run the QPY job on branch-update events as well.
The post-job cache action will then write out to a reachable cache for
all following events.

Cross-symengine tests
---------------------

We previously were just running a single test with differing versions of
symengine between the loading and generation of the QPY files.  This
refactors the QPY `run_tests.sh` script to run a full pairwise matrix of
compatibility tests, to increase the coverage.

[^1]: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[^2]: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security
@jakelishman jakelishman added type: qa Issues and PRs that relate to testing and code quality stable backport potential The bug might be minimal and/or import enough to be port to stable Changelog: None Do not include in changelog labels Oct 3, 2024
@jakelishman jakelishman requested a review from a team as a code owner October 3, 2024 19:13
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @mtreinish
  • @nkanazawa1989

@jakelishman

This comment was marked as outdated.

@coveralls
Copy link

coveralls commented Oct 3, 2024

Pull Request Test Coverage Report for Build 11177916357

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 88.886%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 7 91.98%
Totals Coverage Status
Change from base Build 11161049152: 0.01%
Covered Lines: 74336
Relevant Lines: 83631

💛 - Coveralls

@jakelishman

This comment was marked as outdated.

@jakelishman
Copy link
Member Author

The cache size appears to be 200kB, which is slightly questionable - that's near exactly the size of a single set of QPY files. That said, the cache action is compressing them, and there are a few places in the QPY files that will include some random bytes between versions because of randomly generated Parameter UUID instances, so maybe it does all add up right.

@jakelishman jakelishman changed the title [WIP] Move QPY tests to GitHub Actions and increase inter-symengine tests Move QPY tests to GitHub Actions and increase inter-symengine tests Oct 4, 2024
@jakelishman
Copy link
Member Author

I tested the caching for new PRs on my fork, and verified that it correctly restores the cache even for the initial "open a new PR" event, if the PR doesn't modify the QPY-compatibility test directory.

Building a single dev-version wheel at the top of the file means that the QPY backwards-compatibility job now takes only five minutes on a cache hit (e.g. https:/jakelishman/qiskit-terra/actions/runs/11178693927/job/31076786343) despite now building an extra venv and running three more compatibility test, and it runs in parallel to all other jobs, whereas previously it was often on the critical path. It takes about 15 minutes on a complete cache miss (the same as Azure, give or take), but the cache misses should now be much rarer, which should help a lot for the throughput of backports and releases.

This is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog stable backport potential The bug might be minimal and/or import enough to be port to stable type: qa Issues and PRs that relate to testing and code quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants