Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthchecks and autorestarts for computes #1074

Open
Omrigan opened this issue Sep 20, 2024 · 1 comment
Open

Healthchecks and autorestarts for computes #1074

Omrigan opened this issue Sep 20, 2024 · 1 comment

Comments

@Omrigan
Copy link
Contributor

Omrigan commented Sep 20, 2024

Problem description / Motivation

Branched off from https:/neondatabase/cloud/issues/14114

At this moment, we can only rely on k8s's signal for compute unavailability, specifically, container process monitoring.

We would like to have an end-to-end healthcheck, which would allow us to detect problems, such as:

  1. Postgres does not accept connections
  2. compute_ctl crashlooping
  3. Network partitioning

Feature idea(s) / DoD

We have a healthcheck mechanism, allowing us to detect compute issues within <30s, and taking appropriate actions, such as restarting.

Implementation ideas

We should have a piece of code inside vm which would respond to a healthcheck.

@stradig
Copy link
Contributor

stradig commented Sep 23, 2024

Not sure we will need that or if Kubernetes is good enough. Putting in the backlog for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants