-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PWX-38118: Health Checks #1646
base: release-24.2.0
Are you sure you want to change the base?
PWX-38118: Health Checks #1646
Conversation
TestingPositive Testing
Negative Testing
|
test-storageclusters.zip |
6867357
to
4ed6556
Compare
4ed6556
to
88399d3
Compare
// If not set, then run the health checks | ||
check := cluster.Annotations[pxutil.AnnotationHealthCheck] | ||
check = strings.TrimSpace(strings.ToLower(check)) | ||
if check == "skip" || check == "passed" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.Equalfold can be used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can values be a constant and this be part of health check framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.Equalfold can be used here
Didn't know about this call! Nice catch
cluster.Name, | ||
pxutil.AnnotationHealthCheck, | ||
) | ||
if check == "failed" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
if healthCheck fail once, should we always keep failing ?
Additionally, asking the user to remove the annotation would be a manual intervention and would not fall in line with Helm chart approaches.
Also, the instructions of removing the annotation , would it be documented ? or is it expected that the user goes through the logs or would it be part of the STC status ?
Events will be cleaned up after few mins
// 1. On first install, run HC in same context | ||
// 2. Fail if HC fails | ||
// 3. An annotation is created to save the results of the health check | ||
if err := c.preInstallHealthChecks(cluster); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SyncStorageCluster is a function which is called as part of the reconciler , adding healthChecks as part of a reconcile loop would add to the latency.
Would it be possible to decouple this ?
I believe pre-checks should be done before the controllers are started. Please correct me if my understanding is wrong
17a31c9
to
77dc4aa
Compare
Signed-off-by: Luis Pabon <[email protected]>
ada7cf0
to
d29910f
Compare
What this PR does / why we need it:
Adds environment health checks when a new StorageCluster is installed.
Which issue(s) this PR fixes (optional)
Closes #
Special notes for your reviewer:
There are two major sections in this PR:
px/px-health-check
. This framework will be removed from this repo and moved to its own repo at a later date so that other programs can include it easily. You may ignore it for the review if you want.portworx.io/health-check
annotation:skip
, then the Health Checks will not run.false
if it failed, orpassed
if it worked. If it fails the test, it will also return failure disallowing PxE to be installed.false
, it will again return an error disallowing PxE to be installed.passed
, it will not run.Ignore the test coverage of
px/px-health-check
since that will be moved to a new repo after.