Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a data-only image for cache in CI. #2227

Merged
merged 6 commits into from
Sep 5, 2023

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Aug 22, 2023

The previous cache strategy is left in place for Mac, but it was too
expensive for the 10GB limit GitHub imposes on total cache size and was
leading to cache thrash.

The new strategy sidesteps the cache size limit by creating a cache
offline in a data-only image and switching the Linux CI jobs - the
lion's share - to run via docker / ./dtox.sh. It takes ~3 minutes to
download the images now whereas loading the cache took ~30 seconds
previously, but there is no longer any worry about cache size limits.
Obviously running the same thing you run locally in CI is a big benefit
and was one of the goals when I introduced ./dtox.sh in ~2018. The
downside of the image load times could possibly be overcome by using the
GH cache with docker save / load, but this is good enough for now.

This is mainly to support tweaking CI performance.
Include more and more modern Pythons to cover our supported list for
both local testing and CI. Also include cargo in the image which is
needed for some projects sdist builds of native code.
Add a tox environment that can crate and push the image as well as the
ability to seed the `~/.pex_dev` cache from it.
Fixup a few test issues this conversion smoked out.
@jsirois jsirois changed the title Devpi/cache/container Use a data-only image for cache in CI. Aug 31, 2023
@jsirois jsirois requested review from kaos and benjyw August 31, 2023 17:53
@jsirois jsirois marked this pull request as ready for review August 31, 2023 17:53
@@ -217,3 +218,13 @@ commands =
devpi-server \
--indent 2 \
-o testing/devpi-server.lock

[testenv:build-cache-image]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automating this (and re-creation / push of the base image as needed well) is tracked in #2233.

parser.add_argument("--color", default=None, action="store_true", help="Force colored logging.")
parser.add_argument(
"--devpi", action="store_true", help="Proxy PyPI through a local devpi server."
)
parser.add_argument(
"--require-devpi",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in a subsequent commit in this PR to ensure the job that builds up the cache is actually populating a devpi-server data dir - the ~whole point.

os.environ["_PEX_TEST_DEFAULT_INDEX"] = launch_result.url
os.environ["PIP_INDEX_URL"] = launch_result.url
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a leak I noticed when fixing one of the broken tests in a later commit in this PR. Some test code uses pip install, etc. directly and that was still going out to PyPI.

"${ROOT}/docker/user"
fi

if [[ "${CACHE_MODE}" == "pull" ]]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the only bit that has to do with the new cache data-only image, the other changes in this script support interop between local build and seeding with *_MODE=pull as well as test fixes (altering a few volume mount paths).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is a pretty particular trick I had to fight a bit for. I'll add a comment about the necessity of an empty volume + a run to cause the local volume to be filled from the data image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

- py311-pip22_3_1
- py311-pip23_1_2
- py312-pip23_2
- pypy310-pip20
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped pypy27 testing as a pragmatic / cowardly decision. The base image pypy runs into an OpenSSL linking issue in 1 integration test and after much effort I gave up. Unlike CPython 2.7, I'm pretty sure folks have moved on from PyPy 2.7. I'm guessing there are no support contracts for the latter like there are for the former.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I bumped from pypy39 to pypy310 just because I noticed it was out / the policy / pattern here it to support CI of Pex's oldest and newest supported Pythons and Pips as best as possible without testing all of them.

pip-version: 23_1_2
env:
_TOX_ENV: pypy${{ join(matrix.pypy-version, '') }}-pip${{ matrix.pip-version }}-integration
- python-version: [ 3, 11 ]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped CPython 2.7 tests for Mac to keep things simple and because its almost a certainty Mac (~I need the latest hot thing, and they do not support their old things) + Ancient Python is just not a thing.

echo >&2 "tox -e ${env} failed to start or connect to the devpi-server, exiting..."
exit 1
elif (( $? != 0 )); then
echo >&2 "tox -e ${env} failed, continuing..."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very deliberate. If 2 tests fail out of ~500 (common enough when running without a devpi-server cache!), the devpi-server cache is likely ~fully populated for the run and we got what we needed. Compound this with ~19 permutations of IT being run through this - a ~4 hour process on a fast machine - and set -e would lead to no cache image ever successfully building.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is to manually eyeball that the failures were only a small % when building the data-only image?

Copy link
Member Author

@jsirois jsirois Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really (that's hard to do if you have a docker that uses BuildKit anyway with its default fancy tui logging). I just compare sizes of the dev cache in the cache image:

$ ./dtox.sh inspect
jsirois@db935e314f12:/development/pex$ du -shc /development/pex_dev/*
844M    /development/pex_dev/devpi
1.6G    /development/pex_dev/pyenv
2.4G    total

Vs the local cache:

$ du -shc ~/.pex_dev/*
712M    /home/jsirois/.pex_dev/devpi
1.6G    /home/jsirois/.pex_dev/pyenv
2.3G    total

844M is a little bigger than my local dev cache, which makes sense since I don't always run all tox environments locally and which tests are run or what exactly is downloaded does vary a little bit by interpreter version.

When automating the cache image build, I'm guessing no action will be needed in practice, the failure rates seen in the wild in ITs are O(1/500) when an IT network flakes and not all IT runs in CI do (most don't). If I do take action, I think a cache size comparison to the prior cache image - maybe no more than 1% shrinkage, would be the easiest check that is both somewhat robust and tunable in an easy way.

echo >&2 "tox -e ${env} failed to start or connect to the devpi-server, exiting..."
exit 1
elif (( $? != 0 )); then
echo >&2 "tox -e ${env} failed, continuing..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is to manually eyeball that the failures were only a small % when building the data-only image?

@jsirois jsirois merged commit d54cdce into pex-tool:main Sep 5, 2023
26 checks passed
@jsirois jsirois deleted the devpi/cache/container branch September 5, 2023 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants