Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples: tests: watchdog: samples/subsys/task_wdt breaks nrf platforms performace #33509

Closed
PerMac opened this issue Mar 19, 2021 · 5 comments · Fixed by #35484
Closed

samples: tests: watchdog: samples/subsys/task_wdt breaks nrf platforms performace #33509

PerMac opened this issue Mar 19, 2021 · 5 comments · Fixed by #35484
Assignees
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Milestone

Comments

@PerMac
Copy link
Member

PerMac commented Mar 19, 2021

Describe the bug
The test from samples/subsys/task_wdt leaves nrf boards (for sure nrf9160, but I think nrf5340 as well) in a 'corrupted' state. All other subsequent tests will fail on that board as the watchdog sample is working in a loop and preventing proper operation of the 'corrupted' board. The board requires then manual erase.

To Reproduce
Steps to reproduce the behavior:

  1. have nrf9160dk connected to /dev/ttyACM0
  2. go to your zephyr dir
  3. run scripts/twister -T samples/subsys/task_wdt/ -p nrf9160dk_nrf9160 --device-testing --device-serial /dev/ttyACM0 --jobs 1 -v --inline-logs
  4. the test will pass but will corrupt all future tests
  5. run scripts/twister -T samples/hello_world/ -p nrf9160dk_nrf9160 --device-testing --device-serial /dev/ttyACM0 --jobs 1 -v --inline-logs
  6. see the test failing printing out watchdog output

Expected behavior
the sample for watchdog does not break the board operation

Impact
Very annoying as it will make subsequent tests fail in on-target CI

Environment (please complete the following information):

  • OS: ubuntu 18.04
  • Toolchain zephyr-sdk 0.12.2
  • Commit SHA or Version used zephyr-v2.5.0-1160-gb8b8e8cd6b1a
@PerMac PerMac added the bug The issue is a bug, or the PR is fixing a bug label Mar 19, 2021
@PerMac PerMac added platform: nRF Nordic nRFx area: Tests Issues related to a particular existing or missing test labels Mar 19, 2021
@carlescufi carlescufi assigned martinjaeger and anangl and unassigned ioannisg Mar 23, 2021
@carlescufi carlescufi added the priority: low Low impact/importance bug label Mar 23, 2021
@martinjaeger
Copy link
Member

Thanks for reporting. I was trying to reproduce this on adafruit_feather_nrf52840, which is the only nRF board I have. Unfortunately, twister doesn't work with the built-in UF2 bootloader and programming via SWD doesn't work with my STM32 tools.

Do you have any recommendation for a board that's easy to get up and running and less expensive than the nRF9160 DK?

@martinjaeger
Copy link
Member

Quick status update on this.

I can now reproduce the issue with the nRF52840-DK. There are two different issues:

  1. The device does not reset properly if a new app is flashed after the task_wdt sample. Pressing the hardware reset button or adding "--reset-after-load" to the J-Link runner makes the new app start properly. But that's of course not a permanent fix.

  2. Sometimes (approx. every second time) the new app cannot be flashed at all. It fails with:

Downloading file [blinky.hex]...
T-bit of XPSR is 0 but should be 1. Changed to 1.
****** Error: Timeout while preparing target, RAMCode did not respond in time. (PC = 0x00003D25, CPSR = 0xE000ED00, LR = 0x41000003)!
Failed to prepare RAMCode using RAM
Unspecified error -1

If we set CONFIG_TASK_WDT_HW_FALLBACK=n so that the hardware watchdog is not used anymore, issue 2 disappears. This is strange, as the hardware watchdog driver initialization is the same as in samples/driver/watchdog, where subsequent flashing and reset works properly.

So I'm still trying to figure out what the root cause is and I need to look into implementation details of nRF watchdog hardware and reset behavior. If you have any hints, let me know.

@PerMac
Copy link
Member Author

PerMac commented May 17, 2021

I think in fact I had the same issue with samples/driver/watchdog but I cannot find it reported anywhere. I will check what is the current status of it (think it was not fixed)

@martinjaeger
Copy link
Member

I did some further investigations and I think this is not a task watchdog issue, but a problem with the hardware watchdog driver.

Without --reset-after-load, even very simple apps like blinky and hello world don't start without pressing the reset button, so this seems to be a general issue with the J-Link runner (or is it intended?).

The issue that the device can't be flashed anymore only happens if the hardware watchdog timeout is set to something around 100 ms or below. That's why the watchdog driver sample works fine. If we reduce the timeout to 100 ms in the following line we get the same issue with samples/drivers/watchdog:

#define WDT_MAX_WINDOW 1000U

@PerMac can you confirm that's what you were experiencing before?

I don't have a solution for the watchdog driver, yet. Above mentioned PR #35475 fixes a different issue I discovered during testing.

We could disable the hardware watchdog for Nordic chips in this sample as a preliminary workaround so that twister is not interrupted anymore. Do you think that makes sense or should we keep this open until we find a fix for the hardware watchdog driver?

martinjaeger added a commit to martinjaeger/zephyr that referenced this issue May 20, 2021
Enable the option to pause the watchdog if the MCU is halted by a
debugger.

This fixes an issue with some Nordic MCUs (see zephyrproject-rtos#33509) where the board
could not be flashed anymore if a short watchdog timeout (<100 ms) was
used.

Signed-off-by: Martin Jäger <[email protected]>
martinjaeger added a commit to martinjaeger/zephyr that referenced this issue May 20, 2021
Enable the option to pause the fallback hardware watchdog if the MCU is
halted by a debugger.

This fixes issue zephyrproject-rtos#33509 where some boards with Nordic MCUs could not be
flashed anymore after using the task watchdog sample.

Signed-off-by: Martin Jäger <[email protected]>
@martinjaeger
Copy link
Member

Ok, I think I found a fix now. Please ignore my previous comment regarding how to proceed and have a look at the PR to see if it provides a suitable solution.

galak pushed a commit that referenced this issue May 21, 2021
Enable the option to pause the watchdog if the MCU is halted by a
debugger.

This fixes an issue with some Nordic MCUs (see #33509) where the board
could not be flashed anymore if a short watchdog timeout (<100 ms) was
used.

Signed-off-by: Martin Jäger <[email protected]>
galak pushed a commit that referenced this issue May 21, 2021
Enable the option to pause the fallback hardware watchdog if the MCU is
halted by a debugger.

This fixes issue #33509 where some boards with Nordic MCUs could not be
flashed anymore after using the task watchdog sample.

Signed-off-by: Martin Jäger <[email protected]>
tejlmand pushed a commit to tejlmand/zephyr that referenced this issue Jun 16, 2021
Enable the option to pause the watchdog if the MCU is halted by a
debugger.

This fixes an issue with some Nordic MCUs (see zephyrproject-rtos#33509) where the board
could not be flashed anymore if a short watchdog timeout (<100 ms) was
used.

Signed-off-by: Martin Jäger <[email protected]>

cherry-picked from 784e684

Signed-off-by: Maciej Perkowski [email protected]
tejlmand pushed a commit to tejlmand/zephyr that referenced this issue Jun 16, 2021
Enable the option to pause the fallback hardware watchdog if the MCU is
halted by a debugger.

This fixes issue zephyrproject-rtos#33509 where some boards with Nordic MCUs could not be
flashed anymore after using the task watchdog sample.

Signed-off-by: Martin Jäger <[email protected]>

cherry-picked from: a46a36a

Signed-off-by: Maciej Perkowski [email protected]
PavelVPV pushed a commit to PavelVPV/zephyr that referenced this issue Jun 23, 2021
Enable the option to pause the watchdog if the MCU is halted by a
debugger.

This fixes an issue with some Nordic MCUs (see zephyrproject-rtos#33509) where the board
could not be flashed anymore if a short watchdog timeout (<100 ms) was
used.

Signed-off-by: Martin Jäger <[email protected]>

cherry-picked from 784e684

Signed-off-by: Maciej Perkowski [email protected]
PavelVPV pushed a commit to PavelVPV/zephyr that referenced this issue Jun 23, 2021
Enable the option to pause the fallback hardware watchdog if the MCU is
halted by a debugger.

This fixes issue zephyrproject-rtos#33509 where some boards with Nordic MCUs could not be
flashed anymore after using the task watchdog sample.

Signed-off-by: Martin Jäger <[email protected]>

cherry-picked from: a46a36a

Signed-off-by: Maciej Perkowski [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants