Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPU fault with STM32L452 #30943

Closed
qin-zou opened this issue Dec 22, 2020 · 18 comments
Closed

MPU fault with STM32L452 #30943

qin-zou opened this issue Dec 22, 2020 · 18 comments
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug

Comments

@qin-zou
Copy link

qin-zou commented Dec 22, 2020

I have a customized application modified based on webusb (loopback) app.
I am seeing this MPU fault quite a lot.

[00:00:00.007,000] <err> os: ***** MPU FAULT *****
[00:00:00.007,000] <err> os:   Instruction Access Violation
[00:00:00.007,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000002  r2/a3:  0x1fff3f3b
[00:00:00.007,000] <err> os: r3/a4:  0xedf7ff20 r12/ip:  0x200041c0 r14/lr:  0x0800ab23
[00:00:00.007,000] <err> os:  xpsr:  0x60000053
[00:00:00.007,000] <err> os: r4/v1:  0x1fff3ef1  r5/v2:  0x00000000  r6/v3:  0x00000002
[00:00:00.007,000] <err> os: r7/v4:  0x20000a4c  r8/v5:  0x20004280  r9/v6:  0x20004c00
[00:00:00.007,000] <err> os: r10/v7: 0x08006a71  r11/v8: 0x00000001    psp:  0x20004180
[00:00:00.007,000] <err> os: Faulting instruction address (r15/pc): 0xedf7ff20
[00:00:00.007,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:00.007,000] <err> os: Fault during interrupt handling

[00:00:00.007,000] <err> os: Current thread: 0x200005b8 (unknown)
[00:00:00.199,000] <err> os: Halting system

Not sure what address 0xedf7ff20 is.

Here is the decode for the LR register:

$ /cb/toolchains/gnu-arm-embedded/gcc-arm-none-eabi-9-2020-q2-update/bin/arm-none-eabi-addr2line -f -e uc/stm32_mcu_app/build/zephyr/zephyr.elf 0x0800ab23
signal_poll_event
/net/qin-dev/srv/nfs/qin-data/ws/platform.latest/uc/zephyr-os/zephyr/kernel/poll.c:390

380 static int signal_poll_event(struct k_poll_event *event, uint32_t state)
381 {
382         struct _poller *poller = event->poller;
383         int retcode = 0;
384
385         if (poller) {
386                 if (poller->cb != NULL) {
387                         retcode = poller->cb(event, state);
388                 }
389
390                 poller->is_polling = false;
391
392                 if (retcode < 0) {
393                         return retcode;
394                 }
395         }
396
397         set_event_ready(event, state);
398         return retcode;
399 }

Sometime, adding an extra log message could cause this fault to occur.
And adding a delay may also change the behavior.
Do you have any idea what could be causing this issue?

Thanks,
Qin

@qin-zou qin-zou added the bug The issue is a bug, or the PR is fixing a bug label Dec 22, 2020
@FRASTM FRASTM added the platform: STM32 ST Micro STM32 label Dec 22, 2020
@erwango
Copy link
Member

erwango commented Jan 4, 2021

@qin-zou can you have a try to increase stack sizes (CONFIG_MAIN_STACK_SIZE, CONFIG_IDLE_STACK_SIZE, CONFIG_ISR_STACK_SIZE) ?

@nashif nashif added the priority: medium Medium impact/importance bug label Jan 4, 2021
@qin-zou
Copy link
Author

qin-zou commented Jan 4, 2021

@erwango I tried to increase those stack size, and it didn't seem to help. One more thing I noticed is that recently, I see this issue when I update the firmware using dfu-util with :leave option.
sudo dfu-util -a 0 -i 0 -s 0x08000000:leave -D [upload file]
Once I reset the chip, the MPU fault is not showing up anymore.

Do you think it has something to do with the way I update my image?

@erwango
Copy link
Member

erwango commented Jan 6, 2021

@qin-zou Can you reproduce the issue with in tree webusb sample ?

@qin-zou
Copy link
Author

qin-zou commented Jan 6, 2021

@erwango So far, I am not able to reproduce the issue with webusb sample.

@qin-zou
Copy link
Author

qin-zou commented Jan 6, 2021

Also, with my own app, when I enable CONFIG_ASSERT=y, I would get the following error:

ASSERTION FAIL [z_spin_lock_valid(l)] @ ZEPHYR_BASE/include/spinlock.h:92
        Recursive spinlock 0x20001320

This is using dfu-util to update with :leave option.
If I update using st-flash, I don't see the issue.

@erwango
Copy link
Member

erwango commented Jan 7, 2021

Sometime, adding an extra log message could cause this fault to occur.

In your app, are you doing some processing in a callback ?
Seems that fault handling happens during IRQ treatment:

[00:00:00.007,000] <err> os: Fault during interrupt handling

So it's possible that you have excessive treatment done in interrupt context.
I'd suggest increasing CONFIG_ISR_STACK_SIZE even further (8192 for instance) or minimizing treatment done in ISR context.

@ioannisg
Copy link
Member

ioannisg commented Jan 7, 2021

ng some processing in a callback ?
Seems that fault handling happens during IRQ treatment:

To me this looks like a memory corruption (e.g. null pointer), rather than ISR Stack overflow. But sure, try with double-ing the ISR stack size. :)

@qin-zou
Copy link
Author

qin-zou commented Jan 7, 2021

I tried increasing CONFIG_ISR_STACK_SIZE to 8192, it's still not working.
My callback function has not been called yet in this case.

I tried 3 different ways to update my image:

  1. Updating using st-flash, this works ok
  2. Getting into DFU mode, then updating using: sudo dfu-util -a 0 -i 0 -s 0x08000000 -D [upload file], and reset board, this works as well
  3. Getting into DFU mode, then updating using: sudo dfu-util -a 0 -i 0 -s 0x08000000:leave -D [upload file], only this one fails, which is really weird

@erwango
Copy link
Member

erwango commented Jan 7, 2021

Updating using st-flash, this works ok
Getting into DFU mode, then updating using: sudo dfu-util -a 0 -i 0 -s 0x08000000 -D [upload file], and reset board, this works as well
Getting into DFU mode, then updating using: sudo dfu-util -a 0 -i 0 -s 0x08000000:leave -D [upload file], only this one fails, which is really weird

The difference I see is that "dfu bla :leave" may not perform a hardware reset and you might face an issue of a variable not correctly initialized.

@erwango
Copy link
Member

erwango commented Jan 12, 2021

@erwango So far, I am not able to reproduce the issue with webusb sample.

Setting prio to low, as this happens only with out of tree app.

@qin-zou have you checked about potentially uninitialized variable which would explain difference of behavior when hardware reset doesn't happen ?

@erwango erwango added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels Jan 12, 2021
@qin-zou
Copy link
Author

qin-zou commented Jan 12, 2021

@erwango I believe I tried to initialized all of my heap variables, it didn't make a difference. I am also using stm32 HAL api to read differential adc (since it's not supported by zephyr) while using zephyr's adc initialization, I am wondering if this could also cause such issue. I am trying to move the adc init part to use HAL api and disable ADC from zephyr and see if that helps.

@erwango
Copy link
Member

erwango commented Feb 3, 2021

@qin-zou Have you been able to make some progress on this ?

@qin-zou
Copy link
Author

qin-zou commented Feb 3, 2021

@erwango Sorry, I don't have any progress on this issue. I still see it while using dfu-util with leave option to update my board. Right now, I am just resetting the board to work around it.

@erwango
Copy link
Member

erwango commented Mar 8, 2021

@qin-zou can you have a new try using #31481 ?

@qin-zou
Copy link
Author

qin-zou commented Mar 8, 2021

@erwango I'll try it out in the next day or two and get back to you.

@qin-zou
Copy link
Author

qin-zou commented Mar 10, 2021

@erwango So far, I have not been able to reproduce the issue with my current image. Once I hit the same MPU fault, I'll try the patch from #31481.

@erwango
Copy link
Member

erwango commented Mar 10, 2021

@qin-zou If you cannot reproduce the issue. It would be nice to close. It's still possible to re-open later and it helps us to keep the database at a reasonable level.

@qin-zou
Copy link
Author

qin-zou commented Mar 10, 2021

@erwango Sure I'll close it. Will open when I can reproduce the issue again.

@qin-zou qin-zou closed this as completed Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

5 participants