Clarify warn status code #25

taliaga · 2018-10-19T09:52:24Z

No description provided.

taliaga · 2018-10-19T10:03:58Z

draft.md

- status, HTTP response code in the 4xx-5xx range MUST be used. In case of the
- “warn” status, endpoints SHOULD return HTTP status in the 2xx-3xx range, and
+ the HTTP response code returned by the health endpoint. For “pass” status,
+ HTTP response code in the 2xx-3xx range MUST be used. For “fail” status,


I was thinking of giving here at least one specific [suggested] example code for each status case. Wdyt?
pass: 200
warn: 207?
fail: 424 maybe?

207 is WebDAV. Not all web frameworks (clients or servers) will have an understanding of the WebDAV status codes.

This RFC should probably be conservative with the other HTTP-oriented RFCs it relies on.

Re: “warn” ... some people will put thresholds in their health checks, but, as a best practice, I personally will avoid putting the service metrics thresholds in my code, and I will put the thresholds in my external monitoring and control systems.

To me “warn” should be oriented towards “degraded” states. Think failing health checks that are neither liveness checks nor readiness checks. When liveness or readiness health checks are failing, I am “down.” When non-liveness and non-readiness health checks are failing, I am “warn”, which to me means “up and able to serve but degraded (relying on sane fallbacks or recording things for deferred batch operations upon service restoration, etc.).”

taliaga · 2018-10-19T10:04:41Z

draft.md

- “warn” status, endpoints SHOULD return HTTP status in the 2xx-3xx range, and
+ the HTTP response code returned by the health endpoint. For “pass” status,
+ HTTP response code in the 2xx-3xx range MUST be used. For “fail” status,
+ HTTP response code in the 4xx-5xx range MUST be used. In case of the “warn”


For fail, returning a 5XX sounds like a paradox, doesn't it? The endpoint is working and responding with information so it should be an OK code (but it's telling you the service is unavailable) hehe. Also, I'm not sure, but isn't there a risk of having "response bodies with status information" dropped because of using a 5XX in some systems out there? What's your view?

Relying on status codes only and entirely ignoring message bodies is already a well-established pattern in cloud-native autonomous control plane systems.

I’ll dig up some Kubernetes and Azure links where they talk about relying only on status codes from health/liveness/readiness endpoints for orchestration concerns.

It does indeed make more sense to use a 4xx error code when an API can respond and it is in 'fail' mode. However, if it cannot even respond, the response code will likely be a 5xx (just like @taliaga noted) so 5xx code is equivalent of 'fail'. To ensure there are no weird edge-cases it therefore follows that responding with a 5xx for some 'fails' should be ok. :) /shrug

inadarei · 2018-10-28T02:14:25Z

This PR looks good to me and should probably be merged. However, I noticed that @taliaga and @derekm had an interesting conversation about examples of http response code usage. If it's OK with everybody - I think we should separate that conversation from this one, to not delay acceptance of this change.

Sounds ok?

derekm · 2018-10-28T02:20:16Z

Sure, I don’t think I raised any actual issues. @taliaga raised objections, but in contradiction to common practice.

I never did dig up those links, so I’ll do that now.

derekm · 2018-10-28T02:24:23Z

Azure health: https://docs.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring

Most existing tools and frameworks look only at the HTTP status code that the endpoint returns. To return and validate additional information, you might have to create a custom monitoring utility or service.

derekm · 2018-10-28T02:49:52Z

Kubernetes health: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-http-request

Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

Clarify warn status code

6422be8

taliaga mentioned this pull request Oct 19, 2018

Minor feedback #22

Closed

taliaga commented Oct 19, 2018

View reviewed changes

derekm mentioned this pull request Oct 19, 2018

Introduce a degraded state and a pluggable combiner for global state computation eclipse/microprofile-health#130

Open

inadarei merged commit 4a80ba5 into inadarei:master Oct 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify warn status code #25

Clarify warn status code #25

taliaga commented Oct 19, 2018

taliaga Oct 19, 2018

derekm Oct 19, 2018

derekm Oct 19, 2018

taliaga Oct 19, 2018

derekm Oct 19, 2018

inadarei Oct 28, 2018 •

edited

Loading

inadarei commented Oct 28, 2018 •

edited

Loading

derekm commented Oct 28, 2018 •

edited

Loading

derekm commented Oct 28, 2018

derekm commented Oct 28, 2018

Clarify warn status code #25

Clarify warn status code #25

Conversation

taliaga commented Oct 19, 2018

taliaga Oct 19, 2018

Choose a reason for hiding this comment

derekm Oct 19, 2018

Choose a reason for hiding this comment

derekm Oct 19, 2018

Choose a reason for hiding this comment

taliaga Oct 19, 2018

Choose a reason for hiding this comment

derekm Oct 19, 2018

Choose a reason for hiding this comment

inadarei Oct 28, 2018 • edited Loading

Choose a reason for hiding this comment

inadarei commented Oct 28, 2018 • edited Loading

derekm commented Oct 28, 2018 • edited Loading

derekm commented Oct 28, 2018

derekm commented Oct 28, 2018

inadarei Oct 28, 2018 •

edited

Loading

inadarei commented Oct 28, 2018 •

edited

Loading

derekm commented Oct 28, 2018 •

edited

Loading