Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: compute_ctl can't open neon.tech.log.0 after restart #939

Closed
Omrigan opened this issue May 20, 2024 · 0 comments
Closed

Bug: compute_ctl can't open neon.tech.log.0 after restart #939

Omrigan opened this issue May 20, 2024 · 0 comments
Assignees
Labels
t/bug Issue Type: Bug

Comments

@Omrigan
Copy link
Contributor

Omrigan commented May 20, 2024

The following error message is observed: cannot create /dev/virtio-ports/neon.tech.log.0: Device or resource busy. This has happened together with memory pressure, and OOM-kill of compute_ctl.

The leading hypothesis on this is: while compute_ctl got killed, postgres kept working, thus a descriptor for neon.tech.log.0 remained open (postgres still writes to it). Once compute_ctl was restarted, it tried to open neon.tech.log.0, but it is impossible to open a serial interface more than once.

Environment

Production

Steps to reproduce

Kill compute_ctl, while leaving postgres working.

Possible solutions

  1. Have a layer between logs source and serial device, e.g. socat, which will multiplex output into the serial device.
  2. Epic: Separately tagged logs for VM processes, dmesg, and runner  #578 could be helpful.

Other logs, links

Thread:
https://neondb.slack.com/archives/C03TN5G758R/p1716165317982799

@Omrigan Omrigan added the t/bug Issue Type: Bug label May 20, 2024
@Omrigan Omrigan self-assigned this May 23, 2024
Omrigan added a commit that referenced this issue May 27, 2024
The virtio-serial interface can be opened only once.
Consider the following scenario:

1. Process A starts writing to the serial device.
2. Process A spawns a fork, Process B. It inherits the open file
descriptor.
3. Process A dies; Process B survives and preserves the file descriptor.
4. Process A is restarted but cannot open the serial device again,
    causing it to crash-loop.

To fix it, we are creating FIFO special file, which supports multiple
writers, and spawning cat to redirect it to the virtio-serial.

Part of the #939.

Signed-off-by: Oleg Vasilev <[email protected]>
@Omrigan Omrigan closed this as completed Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant