-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: compute_ctl can't open neon.tech.log.0 after restart #939
Labels
t/bug
Issue Type: Bug
Comments
Omrigan
added a commit
that referenced
this issue
May 27, 2024
The virtio-serial interface can be opened only once. Consider the following scenario: 1. Process A starts writing to the serial device. 2. Process A spawns a fork, Process B. It inherits the open file descriptor. 3. Process A dies; Process B survives and preserves the file descriptor. 4. Process A is restarted but cannot open the serial device again, causing it to crash-loop. To fix it, we are creating FIFO special file, which supports multiple writers, and spawning cat to redirect it to the virtio-serial. Part of the #939. Signed-off-by: Oleg Vasilev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The following error message is observed:
cannot create /dev/virtio-ports/neon.tech.log.0: Device or resource busy
. This has happened together with memory pressure, and OOM-kill of compute_ctl.The leading hypothesis on this is: while compute_ctl got killed, postgres kept working, thus a descriptor for
neon.tech.log.0
remained open (postgres still writes to it). Once compute_ctl was restarted, it tried to openneon.tech.log.0
, but it is impossible to open a serial interface more than once.Environment
Production
Steps to reproduce
Kill compute_ctl, while leaving postgres working.
Possible solutions
socat
, which will multiplex output into the serial device.Other logs, links
Thread:
https://neondb.slack.com/archives/C03TN5G758R/p1716165317982799
The text was updated successfully, but these errors were encountered: