Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
/kind bug
While investigating to understand the issue #1118, I've noticed that the current
preStop
hook for the gateway (envoy) did not work at all. The reason is thatcurl
is not installed in Envoy's default image (it might have been here at some point, but not anymore). As a result, the call to/healthcheck/fail
, to make Envoy fail health checks before shutting down, did not work.To replace
curl
, there were many solutions I've thought of:curl
in it: too overkill for our needs (I don't think we want to maintain a separate Envoy image just for a hook). The ideal solution would have been to havecurl
installed in the envoy official image, but we can't expect thatcurl
in an init container and copy it to envoy container using a volume: this looks way too magic. The reason we need a static binary is we can't just copy thecurl
binary fromapt install -y curl
because this needs some other dependenciescurl
, e.g. installnc
orsocat
(or other tools I'm not aware of) to be able to communicate with the/tmp/envoy.admin
socketI've chosen 4 because it seemed to be the "least" impactful change (having an init container to install a binary used only in the hook is... meh). To communicate with Envoy admin interface using default shell tools, I had to "open" the route to
/healthcheck/fail
endpoint. We need to pass through the HTTP endpoint because we can't write directly to the socket.It was not really easy to see this error, as no
FailedPreStopHook
event were sent, since the last command of the hook (sleep 15
) did succeed. This is why I've also added aset -e
command in the hook: if one command fails in the script, it will return an error, and throw aFailedPreStopHook
.Alternative
An other alternative (simpler) is to only keep the pause (
sleep 15
) in thepreStop
hook, like in this tutorial. But I'm not sure that in this case, when the pod is inTerminating
state, no traffic will be redirected to it. Let's discuss in the comments if you feel this is a better solution.Release Note