Provide different types of health check #35

jmesnil · 2017-07-07T10:29:52Z

This maps directly to Kubernetes liveness and readiness probe but the general idea may be covered by the Healthcheck API.

There are different types of health checks performed by Kubernetes:

readiness (when an application is up and ready to serve requests)
liveness (when an application is up and running without any issue)

Typically, the readiness probe will return FAIL when the application is initializing itself.
Once it is initialized (and ready to server requests), the probe will return OK.
The liveness probe may return OK during initialization (no problem have been detected) but the application is still not ready to be used.

I propose that we add annotations (and corresponding HTTP endpoints) to the spec and API.
For example:

the @Ready annotation could be used to identify a health check procedure checking the readiness of the application. It would return DOWN when the application is not ready to serve request (because it is initializing or its initialization has failed, etc.)
the @Health (or @Live) would identify a health check procedure checking the liveness of the application (Any health check procedure without one of these annotation would be considered as a @Health-y one) It would return DOWN if a problem has been detected by the producer.

Each of these annotations would have an HTTP endpoint associated with it:

/health would query @Health procedures only
=> a DOWN global outcome means that the application is not healthy
/ready would query @Ready procedures only.
=> a DOWN global outcome means that the application is not ready (yet) to serve its purpose (but that could change over time)

The text was updated successfully, but these errors were encountered:

rsvoboda · 2017-07-10T11:30:53Z

As per #29 proposal would you map different types of health checks to /health/<deployment name>/ready and /health/<deployment name>/health (or /health/<deployment name>/live) ?

Could be /health and /ready endpoints exposed via deployment url, something like https://my.service.cloud/my-app-001/ready ?

heiko-braun · 2017-07-10T11:38:34Z

@rsvoboda I think your later proposal host:port/app-context/ready is more suitable and closer to the EE deployment models. Not that we should put this first, but compatibility with EE should be a goal

jmesnil · 2017-08-01T15:14:26Z

As discussed in August 1st Call, this issue should be postponed to 1.1 as 1.0 should focus on delivering a single /health endpoint.
Any other type of "health" (such as /ready) can be added in 1.1 or later with a proper set of annotations.

cescoffier · 2018-06-16T19:06:10Z

It's probably time to revisit this issue. Having 2 endpoints is crucial in Kubernetes environment as liveness and readiness checks are different and serve different purposes. They may likely use a different set of procedures.

kenfinnigan · 2018-06-18T18:05:09Z

+1 to revisiting.

In terms of endpoints, I like the idea of /health and '/ready' to separate them.

Although maybe it needs to go further and retain the existing /health for detailed health information, and have new /ready and /live endpoints, that may re-use some of the checks defined for /health. To not pollute the namespace they could be sub endpoints of /health, so /health/ready and /health/live

cescoffier · 2018-06-19T05:59:38Z

I think I prefer the /health/ready and /health/alive (live or alive?) endpoints, so everything is under the /health context.

Now the question is what should /health return? A composition of both readiness and liveness checks?

kenfinnigan · 2018-06-19T13:48:09Z

Maybe /health is a rollup of all health checks, and there's separate annotations to indicate which are specifically @Ready and @Alive checks?

Ladicek · 2018-06-20T11:20:42Z

One concern with /health being a logical conjunction of all existing health checks: once you start adding @Ready and @Alive health checks, where typically an app can be alive but not ready, the conjunction is basically equivalent of "ready". In such situation, if you configure a liveness check to point to /health, you have a problem.

It's not as clean, but it might be better to treat existing @Health checks, presented at /health, as liveness, and only introduce one new concept: readiness. /health would be a logical conjuction of all @Health checks (like it's today), and something like /health/readiness would be a logical conjunction of all @Health checks and all @Readiness checks (that is, readiness would be a superset of liveness).

Thoughts?

OndroMih · 2018-07-23T09:08:24Z

@Ladicek: In such situation, if you configure a liveness check to point to /health, you have a problem.

What do you mean by you have a problem? I don't see any problems yet. I assume that "readiness" checks are very cheap as they are expected to be called very often during a start-up phase. I also assume that all "readiness" checks are also very simple "liveness" checks, so it shouldn't be a problem to include them in "liveness" checks. Do you have any examples where a "readiness" check wouldn't make sense as a "liveness" check?

Ladicek · 2018-07-23T10:29:14Z

If that was true, then you wouldn't need to distinguish between liveness and readiness at all. My favorite example to illustrate the difference between liveness and readiness is a vital external service, e.g. a database. If the application can't reach its database, it is definitely not ready (= can't accept requests), but may very well be alive (= doesn't need to be restarted). If this readiness check would be used as a liveness check as well, then the application would get needlessly restarted. Another possibility is that becoming ready may take a while (populating internal caches or whatnot) -- again, the application is alive but not ready, and if readiness checks are used as liveness checks, the application will be restarted as not alive, actually preventing it from reaching the ready state. Makes sense?

Emily-Jiang · 2018-08-23T09:35:55Z

Can we fix this issue for Health Check 1.1 release?

antoinesd · 2018-08-27T11:34:43Z

Can we fix this issue for Health Check 1.1 release?

Introduction of /liveness and /readiness is planned for 1.1 release.

For backward compatibility I suggest that we keep /health (while deprecating its usage).However we shouldn't change its current behavior again for backward compatibility sake.

Ladicek · 2018-08-27T11:43:45Z

To keep the URL space occupied by MP as small as possible, I'd suggest not introducing /liveness and /readiness, but instead /health/liveness and /health/readiness. (And of course I agree that /health should be equivalent to /health/liveness.)

antoinesd · 2018-08-27T11:54:28Z

@Ladicek I understand the url space saving requirement, but wouldn't it be confusing to have /health/liveness, /health/readiness and a part of these url /health as endpoints.

What about /health, /hc/liveness and /hc/readiness ?

Ladicek · 2018-08-27T12:11:01Z

I don't really have strong feelings about this, and I agree it might be confusing.

Emily-Jiang · 2018-08-28T14:32:29Z

I prefer the end point of /health, /health/ready and /health/live
and also suggest /health being the combination of /health/ready and /health/live.

Ladicek · 2018-08-28T14:37:28Z

Agree that /health/live is much better than /health/liveness. Same for ready/readiness.

Very much disagree that /health should be a combination of /health/live and /health/ready, but I think I already explained my opinion on that above.

cescoffier · 2018-08-29T06:45:43Z

I would also prefer having everything under /health. I don't mind dropping the ness suffix. However, should make it configurable?

kenfinnigan · 2018-08-29T13:39:57Z

+1 to everything under /health.
+1 to dropping ness.
+1 to endpoints being configurable, including the root /health.
+1 to /health not being a combination of sub endpoints for purely backwards compatible reasons.

antoinesd · 2018-09-06T14:37:04Z

Started PR in these directions

ebullient · 2018-09-21T14:28:07Z

For CF (single endpoint), behavior of "health check" (avoiding URL) is readiness. As soon as the check returns a 200, requests are routed. In this case, you configure a delay (don't look for ok health responses for some interval after process start). If the health check returns anything else, the process is restarted.

This means, in practice, that a CF health check == readiness.

Kubernetes can also be configured with initial delays. You can get by in Kubernetes by only specifying a readiness check with an internal delay (though less than ideal).

We need to remember that these services are transient things, and avoid being overly-sensitive to killing and restarting the process. If it isn't ready, it most likely should die (unless it is already dying), as that is the safest way to re-establish connections and return a process to health.

Ladicek · 2018-09-21T15:07:44Z

Yea, well, there's probably a reason why liveness and readiness are two distinct concepts in Kubernetes. I have to vehemently disagree with

If it isn't ready, it most likely should die

That's not true. If it isn't alive, it should die. There are legitimate cases when a service is alive but not yet ready.

ebullient · 2018-09-21T17:01:26Z

"service is alive but not yet ready." -- yes, which is why the (albeit fragile) initialization delay exists. Kubernetes liveness/readiness definitely makes this part more robust / less fragile (and even then, properly setting *.initialDelaySeconds is a thing)

Assuming something reached ready state, if it subsequently stops being ready, I stand by what I said re: safety. Restarting is the most generally applicable/safest way to recover from problems (equivalent to turning it off and back on again)

arthurdm · 2018-10-16T17:57:48Z

+1 to this feature. will be great to distinguish between the two cases (readiness and liveness). We will take advantage of this in our helm chart.

gunnarmorling · 2018-10-29T22:56:37Z

In terms of URLs, I'd like to see /healthremaining for liveness probes and health/ready being added for readiness probes. Doesn't consume more URL space than before while allowing to tell apart the two sorts of probes.

gcharters · 2018-11-23T15:44:37Z

My apologies if this thread does not match latest thinking, but here's my 2c regarding the mapping of Liveness vs Readiness to MP Health because some of the above discussion doesn't fit my understanding.

When I look at the definition for Livenss & Readiness in the Kubernetes docs [1] it says:

The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

With the current MP Health examples I've seen, they've talked about testing database connections or service dependencies. If the test of the connection or service fails, we return "DOWN" for the Health. That, to my mind, most closely matches Kubernetes' definition of Readiness. The app and server are functioning as expected but aren't ready to receive work. No amount of container restarting will fix things.

Regarding Liveness, I'm a bit skeptical as to whether an application can say whether it's alive or not. If it's dead, it can't say so. :/ . Kubernetes Liveness feels like a lower level concern. A bit like a, can I ping my server to see if it's there, and if it's not, I'll restart the container in the hope it comes back. Maybe I'm alone in this view.

I would therefore assert:

Readiness => Current MP Health => If it's down, wait for it to become healthy.
Liveness => Something TBD => If it's down, restart the container.

[1] https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Emily-Jiang · 2018-11-25T23:48:06Z

@gcharters

Regarding Liveness, I'm a bit skeptical as to whether an application can say whether it's alive or not. If it's dead, it can't say so. :/ . Kubernetes Liveness feels like a lower level concern. A bit like a, can I ping my server to see if it's there, and if it's not, I'll restart the container in the hope it comes back. Maybe I'm alone in this view.

Remember if the endpoint does not respond within the time allowance, k8s will take down as the answer. The liveness check will be checking on the jvm or system, e.g. running out of memory... no more thread left, etc.

Emily-Jiang · 2018-11-25T23:50:03Z

/health/ready --- should test dependent database connections etc
/health/live -- should test the system is health, e.g. not running out memory..

gcharters · 2018-11-26T09:08:17Z

Thanks @Emily-Jiang, I'm happy with the types of things you describe for readiness and liveness checks and the subsequent actions.

heiko-braun added this to the 1.1 milestone Aug 2, 2017

kwsutter mentioned this issue Aug 15, 2017

Should health checks expose /app/health ? #45

Open

OndroMih mentioned this issue Jul 23, 2018

The Health Check API endpoint isn't reliable payara/Payara#2595

Closed

antoinesd mentioned this issue Aug 27, 2018

Introduce Liveness and Readiness endpoints #136

Closed

antoinesd added the enhancement label Aug 27, 2018

antoinesd mentioned this issue Sep 12, 2018

microprofile-health-35 Provide different types of health check #142

Merged

antoinesd added the Liveness & Readiness label Sep 28, 2018

This was referenced Oct 12, 2018

Ensure distinct health procedure names #34

Closed

Support for system level health check procedures #37

Closed

rdebusscher mentioned this issue Jan 25, 2019

PAYARA-3438: Improve usability MP Health endpoint as readinessProbe payara/Payara#3632

Merged

antoinesd closed this as completed Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide different types of health check #35

Provide different types of health check #35

jmesnil commented Jul 7, 2017

rsvoboda commented Jul 10, 2017

heiko-braun commented Jul 10, 2017 •

edited

Loading

jmesnil commented Aug 1, 2017 •

edited

Loading

cescoffier commented Jun 16, 2018

kenfinnigan commented Jun 18, 2018

cescoffier commented Jun 19, 2018

kenfinnigan commented Jun 19, 2018

Ladicek commented Jun 20, 2018

OndroMih commented Jul 23, 2018

Ladicek commented Jul 23, 2018

Emily-Jiang commented Aug 23, 2018

antoinesd commented Aug 27, 2018

Ladicek commented Aug 27, 2018

antoinesd commented Aug 27, 2018

Ladicek commented Aug 27, 2018

Emily-Jiang commented Aug 28, 2018

Ladicek commented Aug 28, 2018

cescoffier commented Aug 29, 2018

kenfinnigan commented Aug 29, 2018

antoinesd commented Sep 6, 2018 •

edited

Loading

ebullient commented Sep 21, 2018

Ladicek commented Sep 21, 2018

ebullient commented Sep 21, 2018 •

edited

Loading

arthurdm commented Oct 16, 2018

gunnarmorling commented Oct 29, 2018 •

edited

Loading

gcharters commented Nov 23, 2018 •

edited

Loading

Emily-Jiang commented Nov 25, 2018

Emily-Jiang commented Nov 25, 2018

gcharters commented Nov 26, 2018

Provide different types of health check #35

Provide different types of health check #35

Comments

jmesnil commented Jul 7, 2017

rsvoboda commented Jul 10, 2017

heiko-braun commented Jul 10, 2017 • edited Loading

jmesnil commented Aug 1, 2017 • edited Loading

cescoffier commented Jun 16, 2018

kenfinnigan commented Jun 18, 2018

cescoffier commented Jun 19, 2018

kenfinnigan commented Jun 19, 2018

Ladicek commented Jun 20, 2018

OndroMih commented Jul 23, 2018

Ladicek commented Jul 23, 2018

Emily-Jiang commented Aug 23, 2018

antoinesd commented Aug 27, 2018

Ladicek commented Aug 27, 2018

antoinesd commented Aug 27, 2018

Ladicek commented Aug 27, 2018

Emily-Jiang commented Aug 28, 2018

Ladicek commented Aug 28, 2018

cescoffier commented Aug 29, 2018

kenfinnigan commented Aug 29, 2018

antoinesd commented Sep 6, 2018 • edited Loading

ebullient commented Sep 21, 2018

Ladicek commented Sep 21, 2018

ebullient commented Sep 21, 2018 • edited Loading

arthurdm commented Oct 16, 2018

gunnarmorling commented Oct 29, 2018 • edited Loading

gcharters commented Nov 23, 2018 • edited Loading

Emily-Jiang commented Nov 25, 2018

Emily-Jiang commented Nov 25, 2018

gcharters commented Nov 26, 2018

heiko-braun commented Jul 10, 2017 •

edited

Loading

jmesnil commented Aug 1, 2017 •

edited

Loading

antoinesd commented Sep 6, 2018 •

edited

Loading

ebullient commented Sep 21, 2018 •

edited

Loading

gunnarmorling commented Oct 29, 2018 •

edited

Loading

gcharters commented Nov 23, 2018 •

edited

Loading