Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/kernel/timer/timer_api/test_timeout_abs still fails on multiple platforms #32839

Closed
FRASTM opened this issue Mar 3, 2021 · 24 comments
Closed
Assignees
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@FRASTM
Copy link
Collaborator

FRASTM commented Mar 3, 2021

After the merge of #32683, some stm32 boards still fail in the tests/kernel/timer/timer_api
especially the nucleo_g071rb board (nucleo_f091rc, l073rz also fail)
--> sysclock of 64MHz from HSI
(test PASSED on nucleo_l152re)

To Reproduce
Steps to reproduce the behavior:

  1. west build -p auto -b nucleo_g071rb tests/kernel/timer/timer_api/
  2. west flash
  3. See error

Logs and console output
on nucleo_g071rb

===================================================================             
START - test_timeout_abs                                                        
                                                                                
Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:723: test_timeout_abs: (t0 + rem_ticks == exp_ticks is false)       
Wrong remaining: now 33369 rem 99966632 expires 100000000 (1)         
 FAIL - test_timeout_abs                                                        
===================================================================             
START - test_sleep_abs                                                          
 PASS - test_sleep_abs                                                          
===================================================================             

Environment (please complete the following information):

Additional context
When reducing the sysclock to 32MHz from HSI, the test_timeout_abs passed, but the next fails:

START - test_timeout_abs                                                        
 PASS - test_timeout_abs                                                        
===================================================================             
START - test_sleep_abs                                                          
                                                                                
    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:750: test_sleep_abs: (late >= 0 && late < k_us_to_ticks_ceil32(100))
expected wakeup at 34775, got 34776 (late 1)                                    
 FAIL - test_sleep_abs                                                          
===================================================================             

@FRASTM FRASTM added bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 area: Tests Issues related to a particular existing or missing test labels Mar 3, 2021
@FRASTM
Copy link
Collaborator Author

FRASTM commented Mar 3, 2021

on nucleo_f091rc:

===================================================================                                                                    
START - test_timeout_abs                                                                                                               
                                                                                                                                       
    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:723: test_timeout_abs: (t0 + rem_ticks == exp_ticks)
Wrong remaining: now 34097 rem 99965904 expires 100000000 (1)                                                                          
 FAIL - test_timeout_abs                                                                                                               
===================================================================                                                                    
START - test_sleep_abs                                                                                                                 
 PASS - test_sleep_abs                                                                                                                 
===================================================================                                                                    

@FRASTM
Copy link
Collaborator Author

FRASTM commented Mar 3, 2021

on nucleo_l073rz :

===================================================================             
START - test_timeout_abs                                                        
 PASS - test_timeout_abs                                                        
===================================================================             
START - test_sleep_abs                                                          
                                                                                
    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/mai)
expected wakeup at 35215, got 35216 (late 1)                                    
 FAIL - test_sleep_abs                                                          
===================================================================   

@nashif nashif added the priority: high High impact/importance bug label Mar 6, 2021
@nashif nashif changed the title tests/kernel/timer/timer_api/test_timeout_abs still fails on stm32 boards tests/kernel/timer/timer_api/test_timeout_abs still fails on multiple platforms Mar 17, 2021
@PerMac
Copy link
Member

PerMac commented Mar 24, 2021

on nrf9160dk_nrf9160:

START - test_timeout_abs
Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:763: test_timeout_abs: (t0 + rem_ticks == exp_ticks is false)
Wrong remaining: now 131730 rem 327548271 expires 327680000 (1)
FAIL - test_timeout_abs in 0.20 seconds

@FRASTM FRASTM removed the platform: STM32 ST Micro STM32 label Apr 6, 2021
@galak
Copy link
Collaborator

galak commented Apr 13, 2021

@andyross / @FRASTM any updates on this?

@FRASTM
Copy link
Collaborator Author

FRASTM commented Apr 14, 2021

I set the flash latency for the stm32g0 and then is the test passed with nucleo_g071 board.
in the drivers/clock_control/clock_stm32_ll_common.c, at the end of the stm32_clock_control_init function :

 #if defined (CONFIG_SOC_SERIES_STM32G0X)
	LL_FLASH_SetLatency(LL_FLASH_LATENCY_1);
#endif /* CONFIG_SOC_SERIES_STM32G0X */

@carlescufi
Copy link
Member

On Nordic platforms, this is caused by non-secure execution speed, see this comment

@FRASTM
Copy link
Collaborator Author

FRASTM commented Apr 26, 2021

For the nucleo_g071rb, the flash latency is set to 2 wait states and power range is 1, which are correct values when running the tests at HCLK = 64MHz. Meaning that the flash latency should not be changed to 1.

@andyross
Copy link
Contributor

andyross commented May 3, 2021

Coming back to this. I had a confirmation of this failure on intel_adsp_cavs15, but that turned out to have been a bug in my tree. I remember seeing the failure on my only STM board (disco_l475_iot1), but having just recovered that from what turns out to have been a bum USB cable, it's not happening for me anymore on master (it's not impossible that the same bad tracing hack I had in the ADSP tree was polluting that test, I guess? I hate phantom bugs)...

The nRF TEE issue seems well-characterized in a separate bug (and I think the solution there needs to be "don't run tickless if sys_clock_elapsed() cannot be made fast").

And the remaining STM failures are addressed by a platform configuration in #34307 (not clear: is that a root cause or workaround?).

Is there another platform I'm missing? I think we can close this?

@PerMac
Copy link
Member

PerMac commented May 4, 2021

@andyross I still see this issue for nrf9160dk. However, the test is flaky. It doesn't end in the final results as most of the time a single repeat is enough. @carlescufi the behavior for secure and non-secure is slightly different. The secure one causes a failure described in this issue. The non-secure is terminated by a timeout.

@FRASTM
Copy link
Collaborator Author

FRASTM commented May 4, 2021

I confirm, still failing on stm32 nucleo_g071rb board on master 37de0c4.
( In #34307 we have a workaround, but not consistent because the flash latency should not be increased)

@andyross
Copy link
Contributor

andyross commented May 4, 2021

@PerMac But you're describing the TEE issue, right? That arch API requires a privilege elevation that takes longer than a 32kHz tick? That problem is fundamental, it happens to glitch this test but you're going to have performance issues in timing everywhere. The idea of tickless is that you replace regular interrupts with just "checking the current time", so the kernel expects that to be fast. If "checking" requires a trap, you might as well just handle the interrupt to begin with. Basically: disable CONFIG_TICKLESS_KERNEL in that configuration, it's not helping you.

@FRASTM Can you explain the flash latency thing? What is that changing that works around the failure? It doesn't sound timer related at all.

@PerMac
Copy link
Member

PerMac commented May 5, 2021

@andyross I think these are two separate issues. The issue from Carles comment is for nrf9160ns and the result is different, there is no output from the test and the test is terminated. Here (in this issue) I refer to the failure at nrf9160 (not non-secure), and the failure is as the one reported: START - test_timeout_abs Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:763: test_timeout_abs: (t0 + rem_ticks == exp_ticks is false) Wrong remaining: now 131730 rem 327548271 expires 327680000 (1) FAIL - test_timeout_abs in 0.20 seconds
And this test fails in both configurations: with and without tickless

@andyross
Copy link
Contributor

andyross commented May 5, 2021

OK, then that sounds like the nRF9 and STM are showing the same issue with remaining time being misreported, but the secure mode stuff at least is well characterized?

Can you explain what that flash wait state patch is doing? I still don't understand what that has to do within the timer hardware, it sounds like if anything it's just addressing execution speed? Which presumably is just hiding the problem in the same way that other hardware doesn't show the issue?

Are these devices simple consumer things? Can you give me a digikey/mouser link or whatever where I can get a nucleo_g071rb and/or nrf9160ns?

@FRASTM
Copy link
Collaborator Author

FRASTM commented May 6, 2021

The WA (#34307) on the stm32g0 is programming 1 wait-state on the flash read, instead of 2 (= the right value).
The nb of wait states to be inserted depends on the clock (HCLK) of flash memory. HCLCK is given by the AHB prescaler applied on the systemclock. In this case (nucleo_g071rb) a value of 2 wait states is recommended for HCLK running at 64MHz.
However 2 wait-states makes the test fail (just llike one of the read operations is too slow) and 1 wait-state makes the test pass.

@erwango
Copy link
Member

erwango commented May 10, 2021

Are these devices simple consumer things? Can you give me a digikey/mouser link or whatever where I can get a nucleo_g071rb and/or nrf9160ns?

@andyross https://www.digikey.com/en/products/detail/stmicroelectronics/NUCLEO-G071RB/9739925

@galak galak added this to the v2.6.0 milestone May 11, 2021
@andyross
Copy link
Contributor

OK, the STM board arrived yesterday evening and I have this reproducing. Should have something to say in a bit.

@andyross
Copy link
Contributor

andyross commented May 13, 2021

Heh, OK that was quick. I didn't fix it. On a lark I tried @simonguinot's patch in #35062 which addresses a clock skew due to unaccounted CPU time in sys_clock_set_timeout(). And with that this test is rock solid. It failed almost every time for me on mainline, but succeeded 24+ times in a row for me with the fix, until I got bored of pressing the reset button.

It actually makes some sense that it would show up here, as Cortex M0 platforms have had performance challenges in the timing code (software divide makes conversions a ton more expensive when using non-power-of-two tick rates) in the past.

@FRASTM @PerMac if you could validate that on your failing boards, it would be helpful. I actually have a -1 on that pull request because of aesthetic complaints, but I think now I'm going to have to remove that.

@PerMac
Copy link
Member

PerMac commented May 17, 2021

I still see the test failing once in a while on nrf9160 (both scenarios: w,w/o tickless) while being checked at that PR :( maybe it occurs less often, hard to say since I didn't run any statistically meaningful set.

START - test_timeout_abs
Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/timer/timer_api/src/main.c:777: test_timeout_abs: (t0 + rem_ticks == exp_ticks is false)
Wrong remaining: now 135407 rem 327544594 expires 327680000 (1)
FAIL - test_timeout_abs in 0.20 seconds

@FRASTM
Copy link
Collaborator Author

FRASTM commented May 20, 2021

For the nucleo_g071rb, I confirm that #35062 fixes the pb.

@galak
Copy link
Collaborator

galak commented May 20, 2021

@PerMac any updates?

@PerMac
Copy link
Member

PerMac commented May 20, 2021

If #35062 helps for others I won't block this PR. Like I said, this test still fails once in a while with the fix, but it is not a big deal. We have to further investigate the issue on our side, maybe it is something with our timer. Then this issue can be closed and I can create a new one. @nordic-krch ^^

@galak
Copy link
Collaborator

galak commented May 20, 2021

If #35062 helps for others I won't block this PR. Like I said, this test still fails once in a while with the fix, but it is not a big deal. We have to further investigate the issue on our side, maybe it is something with our timer. Then this issue can be closed and I can create a new one. @nordic-krch ^^

Can you open a new issue for the NRF platforms.

@ioannisg
Copy link
Member

Agree, better to open a new issues for nRF since it is really a different driver.

@PerMac
Copy link
Member

PerMac commented May 20, 2021

Descoped nrf platforms from here. Issue for nrf platforms: #35509

@galak galak closed this as completed May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants