Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fe310: xtimer hardfault #13109

Closed
fjmolinas opened this issue Jan 13, 2020 · 9 comments · Fixed by #13182
Closed

fe310: xtimer hardfault #13109

fjmolinas opened this issue Jan 13, 2020 · 9 comments · Fixed by #13182
Labels
Area: timers Area: timer subsystems Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@fjmolinas
Copy link
Contributor

Description

#9530 introduced a hardfault when using xtimer on fe310, I had seen it while testing but had attributed it to issues I was having with my setup.

Ran into some issues with `hifive1b`, it currently hardfaults..., not the case in master... investigating

> `BUILD_IN_DOCKER=1 BOARD=hifive1b make -C tests/xtimer_usleep clean all flash term`
> 
> 
> Bench Clock Reset Complete
> 2020-01-07 11:29:44,335 # 
> 2020-01-07 11:29:44,345 # ATE0-->ATE0
> 2020-01-07 11:29:44,362 # OK
> 2020-01-07 11:29:44,519 # AT+BLEINIT=0-->OK
> 2020-01-07 11:29:44,675 # AT+CWMODE=0-->OK
> 2020-01-07 11:29:44,675 # 
> 2020-01-07 11:29:46,795 # Help: Press s to start test, r to print it is ready
> s
> 2020-01-07 11:29:50,685 # START
> 2020-01-07 11:29:50,691 # main(): This is RIOT! (Version: 2020.01-devel-1179-gab4007-pr-9530)
> 2020-01-07 11:29:50,695 # Running test 5 times with 7 distinct sleep times
> 2020-01-07 11:29:50,706 # Unhandled trap:
> 2020-01-07 11:29:50,708 #   mcause: 0x00000001
> 2020-01-07 11:29:50,710 #   mepc:   0x200106e4
> 2020-01-07 11:29:50,711 #   mtval:  0x200106e4
> 2020-01-07 11:29:50,713 # *** RIOT kernel panic:
> 2020-01-07 11:29:50,715 # Unhandled trap
> 2020-01-07 11:29:50,715 # 
> 2020-01-07 11:29:50,716 # *** halted.
> 2020-01-07 11:29:50,716 # 

Originally posted by @fjmolinas in #9530 (comment)

When I performed the final round of testing this board was not connected, so was not part of it. So this slipped by.

Although hifive1b runs on a 32Khz timer, I don't think this is the issue, its the same for frdm-kw41z and there is no issue.

Steps to reproduce the issue

Flash any application using xtimer on hifive1b.

Expected results

No crash.

Actual results

Hardfaults;

2020-01-07 11:29:50,706 # Unhandled trap:
2020-01-07 11:29:50,708 #   mcause: 0x00000001
2020-01-07 11:29:50,710 #   mepc:   0x200106e4
2020-01-07 11:29:50,711 #   mtval:  0x200106e4
2020-01-07 11:29:50,713 # *** RIOT kernel panic:
2020-01-07 11:29:50,715 # Unhandled trap
2020-01-07 11:29:50,715 # 
2020-01-07 11:29:50,716 # *** halted.
2020-01-07 11:29:50,716 # 

Versions


Operating System Environment
-----------------------------<!--
Operating system: Mac OSX, Linux, Vagrant VM
Build environment: GCC, CLang versions (you can run the following command from
the RIOT base directory: make print-versions).
-->

<!-- Thanks for contributing! -->
         Operating System: "Ubuntu" "18.04.2 LTS (Bionic Beaver)"
                   Kernel: Linux 5.0.0-37-generic x86_64 x86_64
             System shell: /bin/dash (probably dash)
             make's shell: /bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2018-q4-major) 8.2.1 20181213 (release) [gcc-8-branch revision 267074]
                  avr-gcc: avr-gcc (GCC) 5.4.0
         mips-mti-elf-gcc: missing
               msp430-gcc: msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)
     riscv-none-embed-gcc: riscv-none-embed-gcc (GNU MCU Eclipse RISC-V Embedded GCC, 64-bit) 8.2.0
     xtensa-esp32-elf-gcc: missing
   xtensa-esp8266-elf-gcc: xtensa-esp8266-elf-gcc (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a5) 5.2.0
                    clang: missing

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.0.0"
      mips-mti-elf-newlib: missing
  riscv-none-embed-newlib: "3.0.0"
  xtensa-esp32-elf-newlib: missing
xtensa-esp8266-elf-newlib: "2.2.0"
                 avr-libc: "2.0.0" ("20150208")

Installed development tools
---------------------------
                   ccache: ccache version 3.4.1
                    cmake: cmake version 3.14.0-rc3
                 cppcheck: Cppcheck 1.82
                  doxygen: 1.8.16
                      git: git version 2.24.0
                     make: GNU Make 4.1
                  openocd: Open On-Chip Debugger 0.10.0+dev-00703-g92bb76a4-dirty (2019-07-19-14:27)
                   python: Python 3.6.8
                  python2: Python 2.7.15+
                  python3: Python 3.6.8
                   flake8: 3.7.7 (mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.1) CPython 3.6.8 on Linux
               coccinelle: missing
@fjmolinas fjmolinas added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms labels Jan 13, 2020
@fjmolinas
Copy link
Contributor Author

@MichelRottleuthner @Hyungsin any ideas?

@MichelRottleuthner
Copy link
Contributor

No idea out of my head. I'll try to see If I can get access to such a board to reproduce it.

@fjmolinas
Copy link
Contributor Author

BTW I do not think this is an an issue with xtimer but with this arch.

@MichelRottleuthner
Copy link
Contributor

Yeah but we should still find it, and if xtimer can reliably trigger it thats kind of good^^

@MichelRottleuthner
Copy link
Contributor

I'll try to see If I can get access to such a board to reproduce it.

unfortunately looks like we don't have one here :/

@kaspar030
Copy link
Contributor

kaspar030 commented Jan 13, 2020

unfortunately looks like we don't have one here :/

I can provide access via ssh, to a hifive1b connected to a raspi, if that helps.

@aabadie
Copy link
Contributor

aabadie commented Jan 13, 2020

Using make debug, the origin of the failure is:

call handle_trap

According the FE310-G002 reference manual, mcause: 0x00000001 means "Instruction access fault" (table 23, page 42).
But unfortunately I have no idea how to fix this issue.

@kaspar030
Copy link
Contributor

I've tried to close in on the issue. This minimal application also triggers the bug:

$ cat main.c 
#include <stdio.h>

#include "irq.h"
#include "xtimer.h"

static void cb(void *arg)
{
    puts(".");
    xtimer_set(arg, 1000000U);
}

static xtimer_t t = { .callback=cb, .arg=&t };

int main(void)
{
    cb(&t);

    while(1) {}

    return 0;
}

Makefile:

$ cat Makefile 
include ../Makefile.tests_common

USEMODULE += xtimer

include $(RIOTBASE)/Makefile.include

I've tried to mimic the pure periph_timer usage using variations of this:

#include <stdio.h>

#include "irq.h"
#include "periph/timer.h"

static void cb(void *arg, int chan)
{
    (void)arg;
    (void)chan;
    puts(".");
    unsigned state = irq_disable();
    uint32_t now = timer_read(0);
    timer_set_absolute(0, 0, now + 32768);
    irq_restore(state);
}

int main(void)
{
    timer_init(0, 32768, cb, NULL);
    uint32_t now = timer_read(0);
    timer_set_absolute(0, 0, now + (0xffffffff >> 1));

    cb(NULL, 0);

    while(1) {}

    return 0;
}

... but that works fine.

Unfortunately the debugging using JLink is not perfect on the hifive1b. It looks though as if the crash happens right after trap return (right after "mret" in cpu/fe310/intr.S:114).

My RISC-V-fu is not the best, any ideas?

@kaspar030
Copy link
Contributor

This is timing dependent. I tried to figure out why this does not crash when stopping here using the debugger: "(right after "mret" in cpu/fe310/intr.S:114)".

Turns out this patch prevents the crash:

$ git diff /home/kaspar/src/riot/cpu/fe310/irq_arch.c
diff --git a/cpu/fe310/irq_arch.c b/cpu/fe310/irq_arch.c
index b4eceb741d..639040b7ce 100644
--- a/cpu/fe310/irq_arch.c
+++ b/cpu/fe310/irq_arch.c
@@ -195,4 +195,6 @@ void handle_trap(unsigned int mcause, unsigned int mepc, unsigned int mtval)
 
     /* ISR done - no more changes to thread states */
     fe310_in_isr = 0;
+    volatile uint32_t foo = 32000000;
+    while(foo--);
 }

The value 32.000.000 works for a one second timer within main above. It does not work for longer timeouts, but does for shorter. If I double the timeout in main, I have to roughly double the 32million.

So it seems like when using xtimer, the timer interrupt machinery fails unless the timer triggers before return from xtimer's ISR. weird.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: timers Area: timer subsystems Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants