-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/kernel/timer/timer_api/test_timeout_abs still fails on multiple platforms #32839
Comments
on nucleo_f091rc:
|
on nucleo_l073rz :
|
on nrf9160dk_nrf9160:
|
I set the flash latency for the stm32g0 and then is the test passed with nucleo_g071 board.
|
On Nordic platforms, this is caused by non-secure execution speed, see this comment |
For the nucleo_g071rb, the flash latency is set to 2 wait states and power range is 1, which are correct values when running the tests at HCLK = 64MHz. Meaning that the flash latency should not be changed to 1. |
Coming back to this. I had a confirmation of this failure on intel_adsp_cavs15, but that turned out to have been a bug in my tree. I remember seeing the failure on my only STM board (disco_l475_iot1), but having just recovered that from what turns out to have been a bum USB cable, it's not happening for me anymore on master (it's not impossible that the same bad tracing hack I had in the ADSP tree was polluting that test, I guess? I hate phantom bugs)... The nRF TEE issue seems well-characterized in a separate bug (and I think the solution there needs to be "don't run tickless if sys_clock_elapsed() cannot be made fast"). And the remaining STM failures are addressed by a platform configuration in #34307 (not clear: is that a root cause or workaround?). Is there another platform I'm missing? I think we can close this? |
@andyross I still see this issue for nrf9160dk. However, the test is flaky. It doesn't end in the final results as most of the time a single repeat is enough. @carlescufi the behavior for secure and non-secure is slightly different. The secure one causes a failure described in this issue. The non-secure is terminated by a timeout. |
@PerMac But you're describing the TEE issue, right? That arch API requires a privilege elevation that takes longer than a 32kHz tick? That problem is fundamental, it happens to glitch this test but you're going to have performance issues in timing everywhere. The idea of tickless is that you replace regular interrupts with just "checking the current time", so the kernel expects that to be fast. If "checking" requires a trap, you might as well just handle the interrupt to begin with. Basically: disable CONFIG_TICKLESS_KERNEL in that configuration, it's not helping you. @FRASTM Can you explain the flash latency thing? What is that changing that works around the failure? It doesn't sound timer related at all. |
@andyross I think these are two separate issues. The issue from Carles comment is for nrf9160ns and the result is different, there is no output from the test and the test is terminated. Here (in this issue) I refer to the failure at nrf9160 (not non-secure), and the failure is as the one reported: |
OK, then that sounds like the nRF9 and STM are showing the same issue with remaining time being misreported, but the secure mode stuff at least is well characterized? Can you explain what that flash wait state patch is doing? I still don't understand what that has to do within the timer hardware, it sounds like if anything it's just addressing execution speed? Which presumably is just hiding the problem in the same way that other hardware doesn't show the issue? Are these devices simple consumer things? Can you give me a digikey/mouser link or whatever where I can get a nucleo_g071rb and/or nrf9160ns? |
The WA (#34307) on the stm32g0 is programming 1 wait-state on the flash read, instead of 2 (= the right value). |
@andyross https://www.digikey.com/en/products/detail/stmicroelectronics/NUCLEO-G071RB/9739925 |
OK, the STM board arrived yesterday evening and I have this reproducing. Should have something to say in a bit. |
Heh, OK that was quick. I didn't fix it. On a lark I tried @simonguinot's patch in #35062 which addresses a clock skew due to unaccounted CPU time in sys_clock_set_timeout(). And with that this test is rock solid. It failed almost every time for me on mainline, but succeeded 24+ times in a row for me with the fix, until I got bored of pressing the reset button. It actually makes some sense that it would show up here, as Cortex M0 platforms have had performance challenges in the timing code (software divide makes conversions a ton more expensive when using non-power-of-two tick rates) in the past. @FRASTM @PerMac if you could validate that on your failing boards, it would be helpful. I actually have a -1 on that pull request because of aesthetic complaints, but I think now I'm going to have to remove that. |
I still see the test failing once in a while on nrf9160 (both scenarios: w,w/o tickless) while being checked at that PR :( maybe it occurs less often, hard to say since I didn't run any statistically meaningful set.
|
For the nucleo_g071rb, I confirm that #35062 fixes the pb. |
@PerMac any updates? |
If #35062 helps for others I won't block this PR. Like I said, this test still fails once in a while with the fix, but it is not a big deal. We have to further investigate the issue on our side, maybe it is something with our timer. Then this issue can be closed and I can create a new one. @nordic-krch ^^ |
Can you open a new issue for the NRF platforms. |
Agree, better to open a new issues for nRF since it is really a different driver. |
Descoped nrf platforms from here. Issue for nrf platforms: #35509 |
After the merge of #32683, some stm32 boards still fail in the tests/kernel/timer/timer_api
especially the nucleo_g071rb board (nucleo_f091rc, l073rz also fail)
--> sysclock of 64MHz from HSI
(test PASSED on nucleo_l152re)
To Reproduce
Steps to reproduce the behavior:
Logs and console output
on nucleo_g071rb
Environment (please complete the following information):
Additional context
When reducing the sysclock to 32MHz from HSI, the test_timeout_abs passed, but the next fails:
The text was updated successfully, but these errors were encountered: