-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional Spinlocks on zephyr 2.4.0 (ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/spinlock.h:92) #30074
Comments
Perhaps @anangl can shed some light on the best way to go about debugging this. |
The assertion fails when the |
I can't tell if this is related but I have spent the past two weeks debugging why the philosophers sample application crashes on my RISC-V SoC with Zephyr 2.4.0 and I have several clues pointing at the spin lock. Sometimes the crashes triggers the assert Also seen |
Same issue on a custom board, ARMv7-A (sic), master branch. I might be wrong on this one, but here's a sketch of the problem as I see it. Below is the relevant, edited for brevity, code. log_core.c:905
The locking function:
Next, in the validation code:
Who initialized thread_cpu? Not the user, not the locking procedure... |
@remy-luisant Good finding. Sounds like you're onto something. I would love to help debug since this sounds related to what I've been debugging for three weeks now. One problem is that I can't really check if thread_cpu is valid because I have no idea what value it is supposed to have. From the code it looks like the two LSB is a CPU number (does Zephyr only support four CPUs?) and perhaps the rest is a memory location (which would then require 32-bit alignment) but I'm very unsure about all of this |
I imagine that people who need an RTOS that runs on multiple CPUs at GHz speeds and with tens of MB of RAM would not go with Zephyr. From their homepage: "The Zephyr Project strives to deliver the best-in-class RTOS for connected resource-constrained devices, built to be secure and safe." Apparently, I'm not exactly the target audience, but well... Git points out this edit as the point of inclusion of the line I suspect to be problematic. The relevant commit was testing on SMP, possibly not on uniprocessors. I may well be wrong on this, of course. Jim, what do you think about this? @cwshu |
Hi @remy-luisant, Thus, I think thread_cpu of k_spinlock should be initialized to zero.
I think there is a bug in Hi @olofk,
|
@cwshu
Requiring initialization seems not to be the common practice of spinlock usage in Zephyr, unless there are many, many, many more bugs around. Furthermore, and I might be wrong on this one, let's assume that the spinlock is indeed initialized to 0. At this point, any code taking a lock on CPU 0 will trip a recursive spinlock error with the current code. |
Actually, a minor correction here, the validation code DOES check for the case of 0, so I am indeed happily wrong about that part. My apologies. The comment about the common usage being on uninitialized locks stands. |
@remy-luisant
Also, I find #20993 has same problem and @andyross comments that 'Spinlocks are valid when zero-initialized.' |
@cwshu I do personally have a problem with that implicit way of initializing variables through the use of More worryingly, I do not see any documentation on the project website, so it appears that this aspect of spinlocks is still not documented. I see no mention in the source either, including in spinlock.h. I consider the matter resolved for my needs, thank you. |
While playing around I now get crashes immediately after startup:
As mentioned in the first post, I have external interrupts configured. They continue to fire even when the MCU restarts. If I interpret the assertion above correctly, Zephyr tries to configure a thread while it is in an ISR. I now disabled all my thread creations but it still happens. I guess Zephyr itself has some own (hidden) threads. |
@caco3 Do you have OpenOCD running? It has a pretty great way of showing you all threads that Zephyr has. At a minimum, there is the idle thread. The logger and the shell also have their own threads. I would also suggest first initializing the spinlocks that are failing, since that is the very first error that you encounter. Once you start getting errors the system might have already gotten damaged. While I am not familiar with Cortex-M, You can add a "bkpt" instruction to asserts to put a Cortex-A7 CPU into a debug state on hitting an assert and debug it from there. I imagine Cortex-M might be similar. I would also advise against debugging software solely by editing code and seeing the output of print statements. A proper debugging setup is much better for finding issues. |
@caco3 @olofj @remy-luisant is this still an issue in your respective boards and environments? |
@carlescufi Sorry. Forgot about this. I believe my root cause was an issue with the memory controller that occasionally dropped writes. Fixed in hw. |
My root cause was the interrupt controller feeding garbage to the GIC, which was then mishandled by Zephyr. A separate issue was filed for that one. I do believe there still might be an issue with the logging system not initializing a lock, but I have no resources available to give you a firm answer. |
I refactored my initialization code. This made the crashs which occured immediately after startup to go away. |
@caco3 do you have any spinlocks defined in your own code? if so, are they automatic variables or globals? |
@carlescufi No, I have never used spinlocks. I haven't seen such crashes since a while. We improved our code and upraged to the latest Zephyr releases. Testing will show further. For me we can close the issue and I would reopen it in case I see new spinlock crashes. |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
We migrated to Zephyr |
Describe the bug
We have an application based on a nrf52840. We see
z_spin_lock_valid
assertion every few hours:ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/spinlock.h:92
.I haven't found much information about the reasons for such an issue.
The code is used for a measurement device. We run an external ADC at 10 kSamples/s and read its data through SPI DMA transfers. After 30 samples (=> 333 Hz) we trigger an Interrupt Handler which copies the data into another buffer.
Then we we release a semaphore. A Kernel Thread is waiting for this semaphore so it can grab the data from the buffer.
Code snippets:
Log:
What is the best way to trace this further down?
Environment :
2.4.0
)The text was updated successfully, but these errors were encountered: