Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce automated performance testing. #1068

Merged
merged 45 commits into from
Jul 23, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
a8323cc
Squashed commit of the following:
bnapolitan Jul 7, 2020
c8b1f9b
Create S3 bucket filename var, remove sleeps before polling, and refa…
bnapolitan Jul 8, 2020
0cf8c5a
Syntax fix.
bnapolitan Jul 8, 2020
cf2b5d8
Add version info to file, display start of performance tests.
bnapolitan Jul 9, 2020
07b54f2
Scale up node group before running 5000 pod test.
bnapolitan Jul 9, 2020
55220bd
Create unique mng names.
bnapolitan Jul 9, 2020
ef4df24
Undo generated mng names.
bnapolitan Jul 10, 2020
5a40307
Update data files for performance tests.
bnapolitan Jul 10, 2020
542382b
Fix auto scaling group name.
bnapolitan Jul 10, 2020
40173e0
Debug 5000 pod performance test.
bnapolitan Jul 10, 2020
615517a
Debug to find out why autoscale group info isn't being retrieved.
bnapolitan Jul 10, 2020
dcf5d4a
Adjust name line number.
bnapolitan Jul 11, 2020
c7c3701
Remove long debugging sleep.
bnapolitan Jul 11, 2020
3060990
Add failure checking for performance tests.
bnapolitan Jul 12, 2020
42b79e9
Updated failure checking fixes.
bnapolitan Jul 13, 2020
485d598
Upload files to corresponding folders in s3 bucket.
bnapolitan Jul 13, 2020
2d8d01b
Check for slow performance test WIP.
bnapolitan Jul 13, 2020
efd0151
Check for slow performance update.
bnapolitan Jul 13, 2020
3d3f14e
Change order of performance tests.
bnapolitan Jul 14, 2020
a5b2086
Attempt performance test fail threshold.
bnapolitan Jul 14, 2020
ee6a9ac
Weekly performance test (midnight Wednesday)
bnapolitan Jul 14, 2020
9d0aa34
Fix syntax error and try again.
bnapolitan Jul 15, 2020
0325ae5
Fix syntax for slow checking.
bnapolitan Jul 15, 2020
b76d9b0
Proper line splicing.
bnapolitan Jul 15, 2020
591deea
Attempt MNG sharing.
bnapolitan Jul 16, 2020
ebef0ae
Merge branch 'upstream-master' into scale-test-single-node
bnapolitan Jul 16, 2020
5e01f11
Fix merging issue.
bnapolitan Jul 16, 2020
711896a
Setup weekly test.
bnapolitan Jul 16, 2020
1a5bbfc
Setup weekly cron.
bnapolitan Jul 17, 2020
e4d603f
Fix performance test slow checking, add kops to weekly tests.
bnapolitan Jul 17, 2020
7e2ad40
Fix slow checking.
bnapolitan Jul 17, 2020
d43d4a6
Find autoscaling group name.
bnapolitan Jul 17, 2020
c5c9564
Try to find autoscaling group name.
bnapolitan Jul 17, 2020
449f88c
Fix weekly test syntax.
bnapolitan Jul 17, 2020
082901b
Scale up to 99 nodes.
bnapolitan Jul 17, 2020
4811d74
Fix yaml syntax.
bnapolitan Jul 17, 2020
f674a55
Update readme with new tests.
bnapolitan Jul 17, 2020
487ce50
Look back 70 lines.
bnapolitan Jul 17, 2020
439ea17
Alternate way to get autoscaling group name.
bnapolitan Jul 17, 2020
a2f8c27
Change weekly test time.
bnapolitan Jul 18, 2020
0e7ef6e
Change line finder for autoscaling group name.
bnapolitan Jul 18, 2020
56b0118
Only report failures on slow up process.
bnapolitan Jul 19, 2020
b46b3ea
Merge branch 'upstream-master' into scale-test-single-node
bnapolitan Jul 22, 2020
18b10c5
Fix 3 most recent files, and reset performance average.
bnapolitan Jul 22, 2020
9f2e891
Format fix.
bnapolitan Jul 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,38 @@ jobs:
- store_artifacts:
path: /tmp/cni-test

performance_test:
docker:
- image: circleci/golang:1.13-stretch
working_directory: /go/src/github.com/{{ORG_NAME}}/{{REPO_NAME}}
environment:
<<: *env
RUN_CONFORMANCE: "false"
RUN_PERFORMANCE_TESTS: "true"
steps:
- checkout
- setup_remote_docker
- aws-cli/setup:
profile-name: awstester
- restore_cache:
keys:
- dependency-packages-store-{{ checksum "test/integration/go.mod" }}
- dependency-packages-store-
- k8s/install-kubectl:
# requires 1.14.9 for k8s testing, since it uses log api.
kubectl-version: v1.14.9
- run:
name: Run the integration tests
command: ./scripts/run-integration-tests.sh
no_output_timeout: 15m
- save_cache:
key: dependency-packages-store-{{ checksum "test/integration/go.mod" }}
paths:
- /go/pkg
when: always
- store_artifacts:
path: /tmp/cni-test

workflows:
version: 2
check:
Expand Down Expand Up @@ -118,3 +150,15 @@ workflows:
- master
jobs:
- integration_test

# triggers weekly tests on master
weekly-test-run:
triggers:
- schedule:
cron: "0 0 * * 6"
filters:
branches:
only:
- master
jobs:
- performance_test
10 changes: 9 additions & 1 deletion scripts/lib/cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ function down-test-cluster() {
}

function up-test-cluster() {
MNGS=""
if [[ "$RUN_PERFORMANCE_TESTS" == true ]]; then
MNGS='{"three-nodes":{"name":"three-nodes","remote-access-user-name":"ec2-user","tags":{"group":"amazon-vpc-cni-k8s"},"release-version":"","ami-type":"AL2_x86_64","asg-min-size":3,"asg-max-size":3,"asg-desired-capacity":3,"instance-types":["m5.xlarge"],"volume-size":40}, "single-node":{"name":"single-node","remote-access-user-name":"ec2-user","tags":{"group":"amazon-vpc-cni-k8s"},"release-version":"","ami-type":"AL2_x86_64","asg-min-size":1,"asg-max-size":1,"asg-desired-capacity":1,"instance-types":["m5.16xlarge"],"volume-size":40}, "multi-node":{"name":"multi-node","remote-access-user-name":"ec2-user","tags":{"group":"amazon-vpc-cni-k8s"},"release-version":"","ami-type":"AL2_x86_64","asg-min-size":1,"asg-max-size":100,"asg-desired-capacity":98,"instance-types":["m5.xlarge"],"volume-size":40}}'
RUN_CONFORMANCE=false
else
MNGS='{"GetRef.Name-mng-for-cni":{"name":"GetRef.Name-mng-for-cni","remote-access-user-name":"ec2-user","tags":{"group":"amazon-vpc-cni-k8s"},"release-version":"","ami-type":"AL2_x86_64","asg-min-size":3,"asg-max-size":3,"asg-desired-capacity":3,"instance-types":["c5.xlarge"],"volume-size":40}}'
fi

echo -n "Configuring cluster $CLUSTER_NAME"
AWS_K8S_TESTER_EKS_NAME=$CLUSTER_NAME \
AWS_K8S_TESTER_EKS_LOG_COLOR=true \
Expand All @@ -26,7 +34,7 @@ function up-test-cluster() {
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_ENABLE=true \
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_ROLE_CREATE=$ROLE_CREATE \
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_ROLE_ARN=$ROLE_ARN \
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_MNGS='{"GetRef.Name-mng-for-cni":{"name":"GetRef.Name-mng-for-cni","remote-access-user-name":"ec2-user","tags":{"group":"amazon-vpc-cni-k8s"},"release-version":"","ami-type":"AL2_x86_64","asg-min-size":3,"asg-max-size":3,"asg-desired-capacity":3,"instance-types":["c5.xlarge"],"volume-size":40}}' \
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_MNGS=$MNGS \
AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_FETCH_LOGS=true \
AWS_K8S_TESTER_EKS_ADD_ON_NLB_HELLO_WORLD_ENABLE=true \
AWS_K8S_TESTER_EKS_ADD_ON_ALB_2048_ENABLE=true \
Expand Down
8 changes: 7 additions & 1 deletion scripts/lib/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ function display_timelines() {
echo "TIMELINE: Default CNI integration tests took $DEFAULT_INTEGRATION_DURATION seconds."
echo "TIMELINE: Updating CNI image took $CNI_IMAGE_UPDATE_DURATION seconds."
echo "TIMELINE: Current image integration tests took $CURRENT_IMAGE_INTEGRATION_DURATION seconds."
echo "TIMELINE: Conformance tests took $CONFORMANCE_DURATION seconds."
if [[ "$RUN_CONFORMANCE" == true ]]; then
echo "TIMELINE: Conformance tests took $CONFORMANCE_DURATION seconds."
fi
if [[ "$RUN_PERFORMANCE_TESTS" == true ]]; then
echo "TIMELINE: Performance tests took $PERFORMANCE_DURATION seconds."
fi
echo "TIMELINE: Down processes took $DOWN_DURATION seconds."
}

209 changes: 209 additions & 0 deletions scripts/lib/performance_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
function check_for_timeout() {
if [[ $((SECONDS - $1)) -gt 10000 ]]; then
RUNNING_PERFORMANCE=false
on_error
fi
}

function run_performance_test_130_pods() {
echo "Running performance tests against cluster"
RUNNING_PERFORMANCE=true

DEPLOY_START=$SECONDS

SCALE_UP_DURATION_ARRAY=()
SCALE_DOWN_DURATION_ARRAY=()
while [ ${#SCALE_UP_DURATION_ARRAY[@]} -lt 3 ]
do
ITERATION_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-130-pods.yaml --replicas=130
sleep 20
while [[ ! $($KUBECTL_PATH get deploy | grep 130/130) ]]
do
sleep 1
echo "Scaling UP"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done

SCALE_UP_DURATION_ARRAY+=( $((SECONDS - ITERATION_START)) )
MIDPOINT_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-130-pods.yaml --replicas=0
while [[ $($KUBECTL_PATH get pods) ]]
do
sleep 1
echo "Scaling DOWN"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done
SCALE_DOWN_DURATION_ARRAY+=($((SECONDS - MIDPOINT_START)))
done

echo "Times to scale up:"
INDEX=0
while [ $INDEX -lt ${#SCALE_UP_DURATION_ARRAY[@]} ]
do
echo ${SCALE_UP_DURATION_ARRAY[$INDEX]}
INDEX=$((INDEX + 1))
done
echo ""
echo "Times to scale down:"
INDEX=0
while [ $INDEX -lt ${#SCALE_DOWN_DURATION_ARRAY[@]} ]
do
echo "${SCALE_DOWN_DURATION_ARRAY[$INDEX]} seconds"
INDEX=$((INDEX + 1))
done
echo ""
DEPLOY_DURATION=$((SECONDS - DEPLOY_START))

now="pod-130-Test#${TEST_ID}-$(date +"%m-%d-%Y-%T").csv"
echo $now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this filename instead?


echo $(date +"%m-%d-%Y-%T") >> $now
echo $((SCALE_UP_DURATION_ARRAY[0])), $((SCALE_DOWN_DURATION_ARRAY[0])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[1])), $((SCALE_DOWN_DURATION_ARRAY[1])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[2])), $((SCALE_DOWN_DURATION_ARRAY[2])) >> $now

cat $now
aws s3 cp $now s3://cni-performance-test-data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bucket needs to be a configurable setting. And if it's not set, we should skip the upload. Something like

if [[ -n "${S3_PERF_TEST_BUCKET:-}" ]]; then
    aws s3 cp $filename "$S3_PERF_TEST_BUCKET"
else 
    echo "No S3 bucket name given, not uploading results"
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the check for S3 bucket before the test?


echo "TIMELINE: 130 Pod performance test took $DEPLOY_DURATION seconds."
RUNNING_PERFORMANCE=false
}

function run_performance_test_730_pods() {
echo "Running performance tests against cluster"
RUNNING_PERFORMANCE=true

DEPLOY_START=$SECONDS

SCALE_UP_DURATION_ARRAY=()
SCALE_DOWN_DURATION_ARRAY=()
while [ ${#SCALE_UP_DURATION_ARRAY[@]} -lt 3 ]
do
ITERATION_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-730-pods.yaml --replicas=730
sleep 100
while [[ ! $($KUBECTL_PATH get deploy | grep 730/730) ]]
do
sleep 2
echo "Scaling UP"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done

SCALE_UP_DURATION_ARRAY+=( $((SECONDS - ITERATION_START)) )
MIDPOINT_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-730-pods.yaml --replicas=0
sleep 100
while [[ $($KUBECTL_PATH get pods) ]]
do
sleep 2
echo "Scaling DOWN"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done
SCALE_DOWN_DURATION_ARRAY+=($((SECONDS - MIDPOINT_START)))
done

echo "Times to scale up:"
INDEX=0
while [ $INDEX -lt ${#SCALE_UP_DURATION_ARRAY[@]} ]
do
echo ${SCALE_UP_DURATION_ARRAY[$INDEX]}
INDEX=$((INDEX + 1))
done
echo ""
echo "Times to scale down:"
INDEX=0
while [ $INDEX -lt ${#SCALE_DOWN_DURATION_ARRAY[@]} ]
do
echo "${SCALE_DOWN_DURATION_ARRAY[$INDEX]} seconds"
INDEX=$((INDEX + 1))
done
echo ""
DEPLOY_DURATION=$((SECONDS - DEPLOY_START))

now="pod-730-Test#${TEST_ID}-$(date +"%m-%d-%Y-%T").csv"
echo $now

echo $(date +"%m-%d-%Y-%T") >> $now
echo $((SCALE_UP_DURATION_ARRAY[0])), $((SCALE_DOWN_DURATION_ARRAY[0])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[1])), $((SCALE_DOWN_DURATION_ARRAY[1])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[2])), $((SCALE_DOWN_DURATION_ARRAY[2])) >> $now

cat $now
aws s3 cp $now s3://cni-performance-test-data

echo "TIMELINE: 730 Pod performance test took $DEPLOY_DURATION seconds."
RUNNING_PERFORMANCE=false
}

function run_performance_test_5000_pods() {
echo "Running performance tests against cluster"
RUNNING_PERFORMANCE=true

DEPLOY_START=$SECONDS

SCALE_UP_DURATION_ARRAY=()
SCALE_DOWN_DURATION_ARRAY=()
while [ ${#SCALE_UP_DURATION_ARRAY[@]} -lt 3 ]
do
ITERATION_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-5000-pods.yaml --replicas=5000
sleep 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 100 seconds based on here?

while [[ ! $($KUBECTL_PATH get deploy | grep 5000/5000) ]]
do
sleep 2
echo "Scaling UP"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done

SCALE_UP_DURATION_ARRAY+=( $((SECONDS - ITERATION_START)) )
MIDPOINT_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-5000-pods.yaml --replicas=0
sleep 100
while [[ $($KUBECTL_PATH get pods) ]]
do
sleep 2
echo "Scaling DOWN"
echo $($KUBECTL_PATH get deploy)
check_for_timeout $DEPLOY_START
done
SCALE_DOWN_DURATION_ARRAY+=($((SECONDS - MIDPOINT_START)))
done

echo "Times to scale up:"
INDEX=0
while [ $INDEX -lt ${#SCALE_UP_DURATION_ARRAY[@]} ]
do
echo ${SCALE_UP_DURATION_ARRAY[$INDEX]}
INDEX=$((INDEX + 1))
done
echo ""
echo "Times to scale down:"
INDEX=0
while [ $INDEX -lt ${#SCALE_DOWN_DURATION_ARRAY[@]} ]
do
echo "${SCALE_DOWN_DURATION_ARRAY[$INDEX]} seconds"
INDEX=$((INDEX + 1))
done
echo ""
DEPLOY_DURATION=$((SECONDS - DEPLOY_START))

now="pod-5000-Test#${TEST_ID}-$(date +"%m-%d-%Y-%T").csv"
echo $now

echo $(date +"%m-%d-%Y-%T") >> $now
echo $((SCALE_UP_DURATION_ARRAY[0])), $((SCALE_DOWN_DURATION_ARRAY[0])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[1])), $((SCALE_DOWN_DURATION_ARRAY[1])) >> $now
echo $((SCALE_UP_DURATION_ARRAY[2])), $((SCALE_DOWN_DURATION_ARRAY[2])) >> $now

cat $now
aws s3 cp $now s3://cni-performance-test-data

echo "TIMELINE: 5000 Pod performance test took $DEPLOY_DURATION seconds."
RUNNING_PERFORMANCE=false
}
33 changes: 27 additions & 6 deletions scripts/run-integration-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ DIR=$(cd "$(dirname "$0")"; pwd)
source "$DIR"/lib/common.sh
source "$DIR"/lib/aws.sh
source "$DIR"/lib/cluster.sh
source "$DIR"/lib/performance_tests.sh

# Variables used in /lib/aws.sh
OS=$(go env GOOS)
Expand All @@ -19,20 +20,24 @@ ARCH=$(go env GOARCH)
: "${DEPROVISION:=true}"
: "${BUILD:=true}"
: "${RUN_CONFORMANCE:=false}"
: "${RUN_PERFORMANCE_TESTS:=false}"
: "${RUNNING_PERFORMANCE:=false}"

__cluster_created=0
__cluster_deprovisioned=0

on_error() {
# Make sure we destroy any cluster that was created if we hit run into an
# error when attempting to run tests against the cluster
if [[ $__cluster_created -eq 1 && $__cluster_deprovisioned -eq 0 && "$DEPROVISION" == true ]]; then
# prevent double-deprovisioning with ctrl-c during deprovisioning...
__cluster_deprovisioned=1
echo "Cluster was provisioned already. Deprovisioning it..."
down-test-cluster
if [[ $RUNNING_PERFORMANCE == false ]]; then
if [[ $__cluster_created -eq 1 && $__cluster_deprovisioned -eq 0 && "$DEPROVISION" == true ]]; then
# prevent double-deprovisioning with ctrl-c during deprovisioning...
__cluster_deprovisioned=1
echo "Cluster was provisioned already. Deprovisioning it..."
down-test-cluster
fi
exit 1
fi
exit 1
}

# test specific config, results location
Expand Down Expand Up @@ -213,6 +218,22 @@ if [[ $TEST_PASS -eq 0 && "$RUN_CONFORMANCE" == true ]]; then
echo "TIMELINE: Conformance tests took $CONFORMANCE_DURATION seconds."
fi

if [[ "$RUN_PERFORMANCE_TESTS" == true ]]; then
START=$SECONDS
$KUBECTL_PATH apply -f ./testdata/deploy-130-pods.yaml
run_performance_test_130_pods
$KUBECTL_PATH delete -f ./testdata/deploy-130-pods.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the testdata apply functions not done inside the test functions?


$KUBECTL_PATH apply -f ./testdata/deploy-730-pods.yaml
run_performance_test_730_pods
$KUBECTL_PATH delete -f ./testdata/deploy-730-pods.yaml

$KUBECTL_PATH apply -f ./testdata/deploy-5000-pods.yaml
run_performance_test_5000_pods
$KUBECTL_PATH delete -f ./testdata/deploy-5000-pods.yaml
PERFORMANCE_DURATION=$((SECONDS - START))
bnapolitan marked this conversation as resolved.
Show resolved Hide resolved
fi

if [[ "$DEPROVISION" == true ]]; then
START=$SECONDS

Expand Down
Loading