Skip to content

Commit

Permalink
Use a data-only image for cache in CI. (#2227)
Browse files Browse the repository at this point in the history
The previous cache strategy is left in place for Mac, but it was too
expensive for the 10GB limit GitHub imposes on total cache size and was
leading to cache thrash.

The new strategy sidesteps the cache size limit by creating a cache
offline in a data-only image and switching the Linux CI jobs - the
lion's share - to run via docker / `./dtox.sh`. It takes ~3 minutes to
download the images now whereas loading the cache took ~30 seconds
previously, but there is no longer any worry about cache size limits.
Obviously running the same thing you run locally in CI is a big benefit
and was one of the goals when I introduced `./dtox.sh` in ~2018. The
downside of the image load times could possibly be overcome by using the
GH cache with docker save / load, but this is good enough for now.
  • Loading branch information
jsirois authored Sep 5, 2023
1 parent 4d2a7b0 commit d54cdce
Show file tree
Hide file tree
Showing 16 changed files with 425 additions and 434 deletions.
416 changes: 55 additions & 361 deletions .github/workflows/ci.yml

Large diffs are not rendered by default.

7 changes: 5 additions & 2 deletions docker/base/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# An image with the necessary binaries and libraries to develop pex.
FROM ubuntu:22.04

# We use pyenv to bootstrap interpreters and pyenv needs most of these packages.
# See: https:/pyenv/pyenv/wiki#suggested-build-environment
# Additionally, some sdists need cargo to build native extensions.
RUN apt update && \
DEBIAN_FRONTEND=noninteractive apt upgrade --yes && \
DEBIAN_FRONTEND=noninteractive apt install --yes \
# We use pyenv to bootstrap interpreters and pyenv needs these.
# See: https:/pyenv/pyenv/wiki#suggested-build-environment
build-essential \
cargo \
curl \
git \
libbz2-dev \
Expand Down
28 changes: 23 additions & 5 deletions docker/base/install_pythons.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,20 @@ export PYENV_ROOT=/pyenv
# N.B.: The 1st listed version will supply the default `python` on the PATH; otherwise order does
# not matter.
PYENV_VERSIONS=(
3.11.4
3.11.5
2.7.18
3.5.10
3.6.15
3.7.17
3.8.17
3.9.17
3.10.12
3.8.18
3.9.18
3.10.13
3.12.0rc1
pypy2.7-7.3.12
pypy3.5-7.0.0
pypy3.6-7.3.3
pypy3.7-7.3.9
pypy3.8-7.3.11
pypy3.9-7.3.12
pypy3.10-7.3.12
)
Expand All @@ -27,7 +31,21 @@ git clone https:/pyenv/pyenv.git "${PYENV_ROOT}" && (
PATH="${PATH}:${PYENV_ROOT}/bin"

for version in "${PYENV_VERSIONS[@]}"; do
pyenv install "${version}"
if [[ "${version}" == "pypy2.7-7.3.12" ]]; then
# Installation of pypy2.7-7.3.12 fails like so without adjusting the version of get-pip it
# uses:
# $ pyenv install pypy2.7-7.3.12
# Downloading pypy2.7-v7.3.12-linux64.tar.bz2...
# -> https://downloads.python.org/pypy/pypy2.7-v7.3.12-linux64.tar.bz2
# Installing pypy2.7-v7.3.12-linux64...
# Installing pip from https://bootstrap.pypa.io/get-pip.py...
# error: failed to install pip via get-pip.py
# ...
# ERROR: This script does not work on Python 2.7 The minimum supported Python version is 3.7. Please use https://bootstrap.pypa.io/pip/2.7/get-pip.py instead.
GET_PIP_URL="https://bootstrap.pypa.io/pip/2.7/get-pip.py" pyenv install "${version}"
else
pyenv install "${version}"
fi

exe="$(echo "${version}" | sed -r -e 's/^([0-9])/python\1/' | tr - . | cut -d. -f1-2)"
exe_path="${PYENV_ROOT}/versions/${version}/bin/${exe}"
Expand Down
24 changes: 24 additions & 0 deletions docker/cache/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# A data image with the necessary binaries and libraries to develop pex.

# Populate the ~/.pex_dev cache.
FROM ghcr.io/pantsbuild/pex/base:latest as cache

ARG PEX_REPO=https:/pantsbuild/pex
ARG GIT_REF=HEAD

# These must be set as a comma-separated list of all tox envs to cache.
ARG TOX_ENVS

RUN git clone "${PEX_REPO}" /development/pex && \
cd /development/pex && \
git reset --hard "${GIT_REF}"

WORKDIR /development/pex
COPY populate_cache.sh /root/
RUN /root/populate_cache.sh /development/pex_dev "${TOX_ENVS}"

# Grab just the ~/.pex_dev cache files for the final data-only image.
FROM scratch
VOLUME /development/pex_dev
COPY --from=cache /development/pex_dev /development/pex_dev
CMD ["I am a pure data image meant only for volume mounting."]
32 changes: 32 additions & 0 deletions docker/cache/populate_cache.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash

set -xuo pipefail

if (( $# != 2 )); then
echo >&2 "usage: $0 [pex dev cache dir] [tox env][,tox env]*"
echo >&2 "Expected 2 arguments, got $#: $*"
exit 1
fi

function run_tox() {
local env="$1"
tox -e "${env}" -- --color --devpi --require-devpi -vvs
if (( $? == 42 )); then
echo >&2 "tox -e ${env} failed to start or connect to the devpi-server, exiting..."
exit 1
elif (( $? != 0 )); then
echo >&2 "tox -e ${env} failed, continuing..."
fi
}

export _PEX_TEST_DEV_ROOT="$1"
for tox_env in $(echo "$2" | tr , ' '); do
run_tox "${tox_env}"

# Tox test environments can leave quite large /tmp/pytest-of-<user> trees; relieve disk pressure
# by cleaning these up as we go.
rm -rf /tmp/pytest*
done

echo "Cached ${_PEX_TEST_DEV_ROOT}:"
du -sh "${_PEX_TEST_DEV_ROOT}"/*
30 changes: 15 additions & 15 deletions docker/user/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM pantsbuild/pex:base
ARG BASE_IMAGE_TAG=latest
FROM ghcr.io/pantsbuild/pex/base:${BASE_IMAGE_TAG}

# Prepare developer shim that can operate on local files and not mess up perms in the process.
ARG USER
Expand All @@ -12,33 +13,32 @@ RUN /root/create_docker_image_user.sh "${USER}" "${UID}" "${GROUP}" "${GID}"
# This will be mounted from the Pex clone directory on the host.
VOLUME /development/pex

# This will be a named volume used to persist .tox venvs and keep them isolated from the host.
VOLUME /development/pex/.tox

# This will be a named volume used to persist the Pex development cache on the host but isolated
# from the host ~/.pex_dev development cache.
VOLUME /development/pex_dev
ENV _PEX_TEST_DEV_ROOT=/development/pex_dev

# This will be a named volume used to persist the Pex cache on the host but isolated from the nost
# ~/.pex cache.
VOLUME /development/pex_root

# This will be a named volume used to persist the pytest tmp tree for use in `./dtox inspect`
# sessions.
VOLUME /development/tmp

# This will be a named volume used to persist .tox venvs and keep them isolated from the host.
VOLUME /development/pex/.tox
VOLUME "/home/${USER}/.pex"

RUN mkdir -p \
/development/pex \
/development/pex/.tox \
/development/pex_dev \
/development/pex_root \
/development/tmp \
/development/pex/.tox && \
"/home/${USER}/.pex" && \
chown -R "${UID}:${GID}" \
/development/pex \
/development/pex/.tox \
/development/pex_dev \
/development/pex_root \
/development/tmp \
/development/pex/.tox
"/home/${USER}/.pex"

# This will be a named volume used to persist the pytest tmp tree (/tmp/pytest-of-$USER/) for use \
# in `./dtox inspect` sessions.
VOLUME /tmp

WORKDIR /development/pex
USER "${USER}":"${GROUP}"
Expand Down
90 changes: 66 additions & 24 deletions dtox.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,73 @@ set -euo pipefail

ROOT="$(git rev-parse --show-toplevel)"

BASE_MODE="${BASE_MODE:-build}"
CACHE_MODE="${CACHE_MODE:-}"
CACHE_TAG="${CACHE_TAG:-latest}"

BASE_INPUT=(
"${ROOT}/docker/base/Dockerfile"
"${ROOT}/docker/base/install_pythons.sh"
)
base_hash=$(cat "${BASE_INPUT[@]}" | git hash-object -t blob --stdin)

function base_id() {
docker images -q -f label=base_hash="${base_hash}" pantsbuild/pex:base
function base_image_id() {
docker image ls -q "ghcr.io/pantsbuild/pex/base:${base_hash}"
}

if [[ -z "$(base_id)" ]]; then
if [[ "${BASE_MODE}" == "build" && -z "$(base_image_id)" ]]; then
docker build \
--tag pantsbuild/pex:base \
--label base_hash="${base_hash}" \
--tag ghcr.io/pantsbuild/pex/base:latest \
--tag "ghcr.io/pantsbuild/pex/base:${base_hash}" \
"${ROOT}/docker/base"
elif [[ "${BASE_MODE}" == "pull" ]]; then
docker pull "ghcr.io/pantsbuild/pex/base:${base_hash}"
fi

USER_INPUT=(
"${ROOT}/docker/user/Dockerfile"
"${ROOT}/docker/user/create_docker_image_user.sh"
)
user_hash=$(cat "${USER_INPUT[@]}" | git hash-object -t blob --stdin)
if [[ -z "$(docker images -q -f label=user_hash="${user_hash}" pantsbuild/pex:user)" ]]; then

function user_image_id() {
docker image ls -q "pantsbuild/pex/user:${user_hash}"
}

if [[ -z "$(user_image_id)" ]]; then
docker build \
--build-arg BASE_ID="$(base_id)" \
--build-arg BASE_IMAGE_TAG="${base_hash}" \
--build-arg USER="$(id -un)" \
--build-arg UID="$(id -u)" \
--build-arg GROUP="$(id -gn)" \
--build-arg GID="$(id -g)" \
--tag pantsbuild/pex:user \
--label user_hash="${user_hash}" \
--tag pantsbuild/pex/user:latest \
--tag "pantsbuild/pex/user:${user_hash}" \
"${ROOT}/docker/user"
fi

if [[ "${CACHE_MODE}" == "pull" ]]; then
# N.B.: This is a fairly particular dance / trick that serves to populate a local named volume
# with the contents of a data-only image. In particular, starting with an empty named volume is
# required to get the subsequent no-op `docker run --volume pex-caches:...` to populate that
# volume. This population only happens under that condition.
docker volume rm --force pex-caches
docker volume create pex-caches
docker run \
--rm \
--volume pex-caches:/development/pex_dev \
"ghcr.io/pantsbuild/pex/cache:${CACHE_TAG}" || true
docker run \
--rm \
--volume pex-caches:/development/pex_dev \
--entrypoint bash \
--user root \
"pantsbuild/pex/user:${user_hash}" \
-c "chown -R $(id -u):$(id -g) /development/pex_dev"
fi

DOCKER_ARGS=()
if [[ "$1" == "inspect" ]]; then
if [[ "${1:-}" == "inspect" ]]; then
shift
DOCKER_ARGS+=(
--entrypoint bash
Expand All @@ -51,19 +83,29 @@ if [[ -t 1 ]]; then
)
fi

if [[ -n "${SSH_AUTH_SOCK:-}" ]]; then
# Some integration tests need an SSH agent. Propagate it when available.
DOCKER_ARGS+=(
--volume "${SSH_AUTH_SOCK}:${SSH_AUTH_SOCK}"
--env SSH_AUTH_SOCK="${SSH_AUTH_SOCK}"
)
fi

# This ensures the current user owns the host .tox/ dir before launching the container, which
# otherwise sets the ownership as root for undetermined reasons
mkdir -p "${ROOT}/.tox"

CONTAINER_HOME="/home/$(id -un)"
exec docker run \
--rm \
--volume "${HOME}/.netrc:/home/$(id -un)/.netrc" \
--volume "${HOME}/.ssh:/home/$(id -un)/.ssh" \
--volume "$(pwd):/development/pex" \
--volume pex_dev:/development/pex_dev \
--volume pex_root:/development/pex_root \
--volume pex_tmp:/development/tmp \
--volume pex_tox:/development/pex/.tox \
--env _PEX_TEST_DEV_ROOT=/development/pex_dev \
--env PEX_ROOT=/development/pex_root \
--env TMPDIR=/development/tmp \
"${DOCKER_ARGS[@]}" \
pantsbuild/pex:user \
"$@"
--rm \
--volume pex-tmp:/tmp \
--volume "${HOME}/.netrc:${CONTAINER_HOME}/.netrc" \
--volume "${HOME}/.ssh:${CONTAINER_HOME}/.ssh" \
--volume "pex-root:${CONTAINER_HOME}/.pex" \
--volume pex-caches:/development/pex_dev \
--volume "${ROOT}:/development/pex" \
--volume pex-tox:/development/pex/.tox \
"${DOCKER_ARGS[@]}" \
"pantsbuild/pex/user:${user_hash}" \
"$@"

Loading

0 comments on commit d54cdce

Please sign in to comment.