-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smp atomic_t global_lock will never be cleared when a thread oops with global_lock is set #35202
Comments
Hi, @IRISZZW , thanks for your message and debug, I think the test case test_fatal_on_smp() from this commit fa2724b ,
|
without
|
Excuse me, @IRISZZW , could you please help to try this again on your hsdk board? thank you so much!
|
This doesn't sound like a bug. The irq_lock()/irq_unlock() APIs are kernel APIs (they're also legacy things that aren't intended to be used in new code, FWIW). A kernel thread that has a fatal error is a system failure, that's not supposed to be a recoverable situation. Note that basically any API that has some kind of resource allocation/release structure is going to leak like this. It's true that the behavior of the locking under SMP is different from uniprocessors, because the latter will end up "forgetting" the lock state on the next context switch. I'm just not sure why that's a bug. Broadly:
|
Just to clarify: none of our other "lock-like" APIs have the behavior desired here. If you terminate a SMP thread that holds a spinlock, it stays locked forever. If you terminate a thread that holds a k_mutex, nothing releases the mutex. If you terminate a thread expected to k_sem_give() a semaphore, the give will never happen. Ditto sys_sem, sys_mutex, k_futex, etc...[1]. The reason this used to work is that the lowest level irq_lock() always worked (on most architectures) just by setting a single bit of data in processor state indicating "interrupts are masked", which isn't enough to survive a context switch. So it could get "released" by switching to a new thread. That was effectively a bug in the old API. [1] Notable corrollary: we have way to many locking abstractions. |
Update testcase test_fatal_on_smp(), and refine it and correct some inappropriate usage such as unnecessary irq_lock(). This prevents the error propagation to the later executing testcase. Fixes zephyrproject-rtos#35200 Fixes zephyrproject-rtos#35202 Signed-off-by: Enjia Mai <[email protected]>
Update testcase test_fatal_on_smp(), and refine it and correct some inappropriate usage such as unnecessary irq_lock(). This prevents the error propagation to the later executing testcase. Fixes #35200 Fixes #35202 Signed-off-by: Enjia Mai <[email protected]>
Describe the bug
When CONFIG_SMP is configed.
we call
irq_lock
, we will callz_smp_global_lock
and setglobal_lock
:zephyr/kernel/smp.c
Lines 16 to 28 in 8e69daf
we call
irq_unlock
, we will callz_smp_global_unlock
and clearglobal_lock
:zephyr/kernel/smp.c
Lines 30 to 41 in 8e69daf
When a thread called
irq_lock
, then oops, theglobal_lock
will never be cleared.I found this bug when I debug for this issue: #35200.
in test case
test_fatal_on_smp
, threadentry_oops
will first call 'irq_lock`, then oops:zephyr/tests/kernel/smp/src/main.c
Lines 652 to 662 in 8e69daf
then in test case
test_smp_release_global_lock_irq
, threadt2_mutex_lock_with_irq
, it will call 'irq_lock` again:zephyr/tests/kernel/smp/src/main.c
Lines 758 to 788 in 8e69daf
because the
global_lock
has been set, soirq_lock
forever.Impact
#35200
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: