Use a data-only image for cache in CI. #2227

jsirois · 2023-08-22T18:46:26Z

The previous cache strategy is left in place for Mac, but it was too
expensive for the 10GB limit GitHub imposes on total cache size and was
leading to cache thrash.

The new strategy sidesteps the cache size limit by creating a cache
offline in a data-only image and switching the Linux CI jobs - the
lion's share - to run via docker / ./dtox.sh. It takes ~3 minutes to
download the images now whereas loading the cache took ~30 seconds
previously, but there is no longer any worry about cache size limits.
Obviously running the same thing you run locally in CI is a big benefit
and was one of the goals when I introduced ./dtox.sh in ~2018. The
downside of the image load times could possibly be overcome by using the
GH cache with docker save / load, but this is good enough for now.

This is mainly to support tweaking CI performance.

Include more and more modern Pythons to cover our supported list for both local testing and CI. Also include cargo in the image which is needed for some projects sdist builds of native code.

Add a tox environment that can crate and push the image as well as the ability to seed the `~/.pex_dev` cache from it.

Fixup a few test issues this conversion smoked out.

jsirois · 2023-08-31T17:54:58Z

tox.ini

@@ -217,3 +218,13 @@ commands =
 devpi-server \
 --indent 2 \
 -o testing/devpi-server.lock
+
+[testenv:build-cache-image]


Automating this (and re-creation / push of the base image as needed well) is tracked in #2233.

jsirois · 2023-08-31T17:56:04Z

testing/bin/run_tests.py

 parser.add_argument("--color", default=None, action="store_true", help="Force colored logging.")
 parser.add_argument(
 "--devpi", action="store_true", help="Proxy PyPI through a local devpi server."
 )
+ parser.add_argument(
+ "--require-devpi",


This is used in a subsequent commit in this PR to ensure the job that builds up the cache is actually populating a devpi-server data dir - the ~whole point.

jsirois · 2023-08-31T17:56:58Z

testing/bin/run_tests.py

 os.environ["_PEX_TEST_DEFAULT_INDEX"] = launch_result.url
+ os.environ["PIP_INDEX_URL"] = launch_result.url


This was a leak I noticed when fixing one of the broken tests in a later commit in this PR. Some test code uses pip install, etc. directly and that was still going out to PyPI.

jsirois · 2023-08-31T18:00:08Z

dtox.sh

 "${ROOT}/docker/user"
 fi

+if [[ "${CACHE_MODE}" == "pull" ]]; then


This is actually the only bit that has to do with the new cache data-only image, the other changes in this script support interop between local build and seeding with *_MODE=pull as well as test fixes (altering a few volume mount paths).

And this is a pretty particular trick I had to fight a bit for. I'll add a comment about the necessity of an empty volume + a run to cause the local volume to be filled from the data image.

jsirois · 2023-08-31T18:03:23Z

.github/workflows/ci.yml

+ - py311-pip22_3_1
+ - py311-pip23_1_2
+ - py312-pip23_2
+ - pypy310-pip20


I dropped pypy27 testing as a pragmatic / cowardly decision. The base image pypy runs into an OpenSSL linking issue in 1 integration test and after much effort I gave up. Unlike CPython 2.7, I'm pretty sure folks have moved on from PyPy 2.7. I'm guessing there are no support contracts for the latter like there are for the former.

And I bumped from pypy39 to pypy310 just because I noticed it was out / the policy / pattern here it to support CI of Pex's oldest and newest supported Pythons and Pips as best as possible without testing all of them.

jsirois · 2023-08-31T18:04:56Z

.github/workflows/ci.yml

- pip-version: 23_1_2
- env:
- _TOX_ENV: pypy${{ join(matrix.pypy-version, '') }}-pip${{ matrix.pip-version }}-integration
+ - python-version: [ 3, 11 ]


I dropped CPython 2.7 tests for Mac to keep things simple and because its almost a certainty Mac (~I need the latest hot thing, and they do not support their old things) + Ancient Python is just not a thing.

jsirois · 2023-08-31T18:13:14Z

docker/cache/populate_cache.sh

+ echo >&2 "tox -e ${env} failed to start or connect to the devpi-server, exiting..."
+ exit 1
+ elif (( $? != 0 )); then
+ echo >&2 "tox -e ${env} failed, continuing..."


This is very deliberate. If 2 tests fail out of ~500 (common enough when running without a devpi-server cache!), the devpi-server cache is likely ~fully populated for the run and we got what we needed. Compound this with ~19 permutations of IT being run through this - a ~4 hour process on a fast machine - and set -e would lead to no cache image ever successfully building.

So the idea is to manually eyeball that the failures were only a small % when building the data-only image?

Not really (that's hard to do if you have a docker that uses BuildKit anyway with its default fancy tui logging). I just compare sizes of the dev cache in the cache image:

$ ./dtox.sh inspect jsirois@db935e314f12:/development/pex$ du -shc /development/pex_dev/* 844M /development/pex_dev/devpi 1.6G /development/pex_dev/pyenv 2.4G total

Vs the local cache:

$ du -shc ~/.pex_dev/* 712M /home/jsirois/.pex_dev/devpi 1.6G /home/jsirois/.pex_dev/pyenv 2.3G total

844M is a little bigger than my local dev cache, which makes sense since I don't always run all tox environments locally and which tests are run or what exactly is downloaded does vary a little bit by interpreter version.

When automating the cache image build, I'm guessing no action will be needed in practice, the failure rates seen in the wild in ITs are O(1/500) when an IT network flakes and not all IT runs in CI do (most don't). If I do take action, I think a cache size comparison to the prior cache image - maybe no more than 1% shrinkage, would be the easiest check that is both somewhat robust and tunable in an easy way.

benjyw · 2023-09-05T02:14:17Z

docker/cache/populate_cache.sh

+ echo >&2 "tox -e ${env} failed to start or connect to the devpi-server, exiting..."
+ exit 1
+ elif (( $? != 0 )); then
+ echo >&2 "tox -e ${env} failed, continuing..."


So the idea is to manually eyeball that the failures were only a small % when building the data-only image?

jsirois force-pushed the devpi/cache/container branch from 9be3958 to ba62761 Compare August 28, 2023 19:50

jsirois added 4 commits August 31, 2023 10:27

Allow more control over the devpi-server.

c5204e2

This is mainly to support tweaking CI performance.

Update the base image.

f1dbaf1

Include more and more modern Pythons to cover our supported list for both local testing and CI. Also include cargo in the image which is needed for some projects sdist builds of native code.

Add a data-only cache image for ./dtox.sh.

783e5bc

Add a tox environment that can crate and push the image as well as the ability to seed the `~/.pex_dev` cache from it.

Use ./dtox.sh for Linux CI.

b061ee1

Fixup a few test issues this conversion smoked out.

jsirois force-pushed the devpi/cache/container branch from bf48a92 to b061ee1 Compare August 31, 2023 17:44

jsirois changed the title ~~Devpi/cache/container~~ Use a data-only image for cache in CI. Aug 31, 2023

jsirois requested review from kaos and benjyw August 31, 2023 17:53

jsirois marked this pull request as ready for review August 31, 2023 17:53

jsirois commented Aug 31, 2023

View reviewed changes

Document the critical trick here.

8a00759

jsirois commented Aug 31, 2023

View reviewed changes

benjyw approved these changes Sep 5, 2023

View reviewed changes

Fixup pypy2.7 install.

cdd0738

jsirois merged commit d54cdce into pex-tool:main Sep 5, 2023
26 checks passed

jsirois deleted the devpi/cache/container branch September 5, 2023 21:27

jsirois mentioned this pull request Feb 20, 2024

Use a docker image in CI for reproducibility via local runs using ./dtox #526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a data-only image for cache in CI. #2227

Use a data-only image for cache in CI. #2227

jsirois commented Aug 22, 2023 •

edited

Loading

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

jsirois Aug 31, 2023

benjyw Sep 5, 2023

jsirois Sep 5, 2023 •

edited

Loading

benjyw Sep 5, 2023

		os.environ["_PEX_TEST_DEFAULT_INDEX"] = launch_result.url
		os.environ["PIP_INDEX_URL"] = launch_result.url

Use a data-only image for cache in CI. #2227

Use a data-only image for cache in CI. #2227

Conversation

jsirois commented Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsirois Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsirois commented Aug 22, 2023 •

edited

Loading

jsirois Sep 5, 2023 •

edited

Loading