Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Buildkite CI queues from AWS to GKE #878

Merged
merged 2 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 97 additions & 27 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,76 +1,146 @@
container:
kubernetes: &kubernetes
gitEnvFrom:
- secretRef:
name: oss-github-ssh-credentials
sidecars:
- image: us-west1-docker.pkg.dev/ci-compute/buildkite-images/buildkite-dind:v1
volumeMounts:
- mountPath: /var/run/
name: docker-sock
securityContext:
privileged: true
allowPrivilegeEscalation: true
mirrorVolumeMounts: true # CRITICAL: this must be at the same indentation level as sidecars
podSpec: &podSpec
containers:
- &commandContainer
image: us-west1-docker.pkg.dev/ci-compute/buildkite-images/buildkite-command-container:v2
command:
- |-
echo "Command step was not overridden."
exit 1
volumeMounts:
- mountPath: /var/run/
name: docker-sock
resources:
requests:
cpu: 7500m
memory: 30G
volumes:
- name: docker-sock
emptyDir: {}

agents:
queue: buildkite-gcp

steps:
- label: "fossa analyze"
agents:
queue: "init"
docker: "*"
command: ".buildkite/scripts/fossa.sh"
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
.buildkite/scripts/fossa.sh

- label: "Lint Check"
agents:
queue: "init"
docker: "*"
command: ".buildkite/scripts/lint.sh"
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
.buildkite/scripts/lint.sh
- docker-compose#v3.0.0:
run: unit-test-test-service
config: docker/buildkite/docker-compose.yaml

- label: ":java: Unit test with test services"
agents:
queue: "workers"
docker: "*"
command: "./gradlew --no-daemon test jacocoTestReport"
artifact_paths:
- "build/reports/jacoco/test/*.xml"
timeout_in_minutes: 15
timeout_in_minutes: 30
retry:
automatic:
- exit_status: "*"
limit: 3
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
./gradlew --no-daemon test jacocoTestReport
- docker-compose#v3.0.0:
run: unit-test-test-service
config: docker/buildkite/docker-compose.yaml

- label: ":java: Unit test with docker services sticky on"
agents:
queue: "workers"
docker: "*"
command: "./gradlew --no-daemon test"
timeout_in_minutes: 15
timeout_in_minutes: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect CI jobs to take longer on GKE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem a bit flakier when leaving the timeout as is, so I bumped it. I don't have a concrete root cause, but the current theory is moving from VM based infra on AWS where you have dedicated bandwidth to a k8s cluster with potentially many containers competing for bandwidth could be part of the issue. CPU / Memory resources were configured to closely match what was available on AWS.

retry:
automatic:
- exit_status: "*"
limit: 3
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
./gradlew --no-daemon test
- docker-compose#v3.0.0:
run: unit-test-docker-sticky-on
config: docker/buildkite/docker-compose.yaml

- label: ":java: Unit test with docker services sticky off"
agents:
queue: "workers"
docker: "*"
command: "./gradlew --no-daemon test"
timeout_in_minutes: 15
timeout_in_minutes: 30
retry:
automatic:
- exit_status: "*"
limit: 3
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
./gradlew --no-daemon test
- docker-compose#v3.0.0:
run: unit-test-docker-sticky-off
config: docker/buildkite/docker-compose.yaml

- wait

- label: ":java: Report test coverage"
agents:
queue: "workers"
docker: "*"
command: ".buildkite/scripts/coverage.sh"
retry:
automatic:
- exit_status: "*"
limit: 3
plugins:
- kubernetes:
<<: *kubernetes
podSpec:
<<: *podSpec
containers:
- <<: *commandContainer
command:
- |-
.buildkite/scripts/coverage.sh
- docker-compose#v3.0.0:
run: test-coverage-report
config: docker/buildkite/docker-compose.yaml
7 changes: 6 additions & 1 deletion docker/buildkite/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ ENV APACHE_THRIFT_VERSION=0.9.3
# Install dependencies using apk
RUN apk update && apk add --virtual wget ca-certificates wget && apk add --virtual build-dependencies build-base gcc
# Git is needed in order to update the dls submodule
RUN apk add git libstdc++
RUN apk add git libstdc++ bash curl

# Install buildkite agent
# https://buildkite.com/docs/agent/v3/linux
RUN bash -c "`curl -sL https://raw.githubusercontent.com/buildkite/agent/main/install.sh`"
RUN ln -s /root/.buildkite-agent/bin/buildkite-agent /usr/bin/buildkite-agent

# Compile source
RUN set -ex ;\
Expand Down
1 change: 0 additions & 1 deletion docker/buildkite/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -122,4 +122,3 @@ services:
- COVERALLS_REPO_TOKEN
volumes:
- "../../:/cadence-java-client"
- /usr/bin/buildkite-agent:/usr/bin/buildkite-agent