Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace check-instance-ready endpoint to use status instead of stats from CanarieAPI #293

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,13 @@
[Unreleased](https:/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
## Fixes:
- Scripts: fix [`check-instance-ready`](birdhouse/scripts/check-instance-ready) script.

Previously employed `/canarie/node/service/stats` endpoint could be unreliable for some services under the node that
produced log collection errors to populate stats. Instead, use `/canarie/node/service/status` that check only if the
services are responsive according to configured endpoints under CanarieAPI. This status endpoint is the same one that
is employed by the CI test suite to check that the instance is ready before starting notebook tests.

[1.23.0](https:/bird-house/birdhouse-deploy/tree/1.23.0) (2023-02-10)
------------------------------------------------------------------------------------------------------------------
Expand Down
14 changes: 8 additions & 6 deletions birdhouse/scripts/check-instance-ready
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,31 @@ COMPOSE_DIR="`dirname "$THIS_DIR"`"

if [ -f "$COMPOSE_DIR/env.local" ]; then
# Get PAVICS_FQDN
. $COMPOSE_DIR/env.local
. "${COMPOSE_DIR}/env.local"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out, this change will conflict with the DELAYED_EVAL PR.

fi

MONITOR_URL="https://${PAVICS_FQDN}/canarie/node/service/status"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my memory is correct /status always return 200 so it's a bad test.

In your screenshot below, you see Solr and ncWMS2 having error so if that /status still return 200, that's wrong. I think we have to keep /stats.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could do is use /status with the Accept: application/json to better validate the contents.
I think the 200 is returned only for the HTML representation (though the correct code could be returned in that case as well...)

Technically, /stats could be completely empty if it never (yet) ran the log parsing cron job.
I've encountered this issue recently in the PR pipeline where tests were started too early because /stats looked OK as everything was empty.

set -x
curl --include --silent https://$PAVICS_FQDN/canarie/node/service/stats | head
curl --include --silent "${MONITOR_URL}" | head

set +x
echo "
The curl above should return the HTTP response code 200 to confirm instance is ready.
"
set -x

HTTP_RESPONSE_CODE="`curl --write-out '%{http_code}' --output /dev/null --silent https://$PAVICS_FQDN/canarie/node/service/stats`"
if [ $HTTP_RESPONSE_CODE -ne 200 ]; then
HTTP_RESPONSE_CODE=$(curl --write-out '%{http_code}' --output /dev/null --silent "${MONITOR_URL}")
if [ "${HTTP_RESPONSE_CODE}" -ne 200 ]; then
set +x
echo "
HTTP response code received: $HTTP_RESPONSE_CODE (expected 200).
HTTP response code received: ${HTTP_RESPONSE_CODE} (expected 200).

Will sleep for about 1 minute and try again since the canarie-api refresh every minute.

Will retry only once more and exit immediately.
"
set -x
sleep 65
curl --include --silent https://$PAVICS_FQDN/canarie/node/service/stats | head
curl --include --silent "${MONITOR_URL}" | head
fi