-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel threads
and kernel stacks
deadlock in many scenarios
#32145
Comments
@jharris-intel You can always use I am not sure if the original problem should be fixed inside the shell module. |
What's the code in question? The shell is an upper level subsystem and the kernel code shouldn't be calling into it, ever. I'm guessing you found a user of the thread enumeration API that is doing questionable things in the callback? But yes: the thread and stack "foreach" enumerators are indeed problematic and encourage some really dangerous patterns. I wouldn't cry if they just disappeared, but they're popular. And without a always-available general heap, it's actually hard to provide anything but a giant locked callback for users. |
Sorry, I am confused. Could you please point me at There's e.g. |
Sorry about that - I forgot to actually link the code in question. It's the |
@jharris-intel : Sorry for the spelling mistake. The function I am thinking about is You can also consider using |
Sorry, to be clear, this is a builtin Zephyr module. Are you comfortable with a builtin Zephyr module using an API in a fashion that's explicitly against its specification? I am concerned that an API is brittle enough that even the builtins that are using it are using it incorrectly. That's generally a bad sign.
Yep, it's an interesting issue. The following may help, although mainly by making the issue "loud" instead of silent:
The net effect of this, assuming I understand correctly how this'd work:
Still nowhere near good, but at least replaces previous "jumps off into the weeds" behavior with something that keeps the kernel alive. The only other alternative I can see (that still retains any of the functionality of the API at least) is for thread deletion to be deferred until after the |
Just to be clear: obviously we need to fix in-tree bugs. My reference to "the kernel" was parochial and about which promise by which API was being violated and by whom. (And, of course, whether or not this is on my plate to fix, heh). We're a big project with a lot of subsystems working at different levels of abstraction, so this kind of collision isn't uncommon. |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
The specific issue is still standing in top-of-tree, it is 100% reproducible for me with SMP enabled (hits the "Context switching while holding lock!" assertion). I think it is much better to switch to cmd_kernel_threads/cmd_kernel_stacks to calling k_thread_foreach_unlocked for SMP case. The probability of hitting the corner-case of manual execution of shell command colliding with thread creation/deletion is small and the temporary incorrect output that might result from that is benign compared to an assertion |
Describe the bug
kernel threads
andkernel stacks
callshell_print
in the callback ofk_thread_foreach
.shell_print
attempts to take a mutex with a non-zero timeout. This can result in context switching while holdingz_thread_monitor_lock
, which results in incorrect behavior.To Reproduce
n/a
Expected behavior
...I'm not actually sure what the desired behavior is, to be honest. At a high level, I suppose you could take the mutex outside the
k_thread_foreach
- but even then, API issues aside, you'd still have a problem if/when your shell backend backpressured.I'm mainly opening this ticket as an invitation for discussion. Note that there are other issues discussing tangential topics, e.g. #13318, #14172, #20937, and #22841.
Impact
kernel threads
andkernel stacks
(and presumably any otherk_thread_foreach
user that does something that could block) sometimes deadlock if any of their calls toshell_print
block. In particular:k_thread_foreach
calls anything that ends up trying to takez_thread_monitor_lock
, you deadlock (or assert).k_thread_foreach
ends up trying to takez_thread_monitor_lock
, you deadlock (or assert).Logs and console output
n/a
Additional context
Note that there appear to be multiple workarounds in place to use
k_thread_foreach_unlocked
instead, which results in different incorrect behavior (e.g. if anything in the system happens to delete/add a thread while you are in the call you can follow an invalid pointer off into the weeds.)The only ways I can see
kernel threads
andkernel stacks
working currently are either:k_thread_foreach
, copy all of the information out of kernel structures into allocated memory. Then print outside ofk_thread_foreach
and release allocated memory. Unfortunately, "enough for every thread's information" isn't particularly well-defined, and you can't allocate during the execution ofk_thread_foreach
.k_thread_foreach
and doing, say, the first 10 threads, then the 10 threads starting at the 10th, and so on. Less memory use, but has the potential to miss or duplicate threads, and is O(n^2) time w.r.t. the number of threads in the system (as you have to walk the linked list all the way from the front every time).k_thread_foreach
that in turn ends up calling shell print.next_thread
/prev_thread
) to be mutually exclusive withk_thread_foreach
, and ensure that it's documented that e.g. logging backends can never dynamically spin up threads....none of which are exactly ideal.
I'm hoping there are other options I'm not seeing here, because the commands are rather useful in general.
The text was updated successfully, but these errors were encountered: