Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 hard faults (on ground + in air) with PX4FW v1.7.4 #9348

Closed
philipoe opened this issue Apr 21, 2018 · 9 comments
Closed

2 hard faults (on ground + in air) with PX4FW v1.7.4 #9348

philipoe opened this issue Apr 21, 2018 · 9 comments

Comments

@philipoe
Copy link
Contributor

philipoe commented Apr 21, 2018

Bug Report

We (ASL/ETHZ) have recently had two hard faults that caused Pixhawk to reboot. One of them unfortunately happened in air with a 3.2m wingspan fixed-wing, one happened during on-ground testing.

Setup

  • Hardware: mRo Pixhawk (px4fmu-v3 / 2MB flash)
  • Firmware: Basically PX4FW v1.7.4, i.e. master-branch from 5 days ago (71170dc) but with some slight modifications mainly to adapt the mixers to our platform and one custom sensor driver. The estimator is EKF2, the flight controller is the stock controller.

Hard fault 1: In-air

We had done 3 short MANUAL flights before that, and in this flight first flew manual for about three minutes and then switched to stabilized. The hard fault ocured 1 minute after switching to stabilized mode. The pilot continued to fly in PX4IO's (important!) manual override mode, and, as he barely noticed the issues, actually went back into stabilized mode for some time before then landing the plane following our instructions. The ulogger-log of part 1 (see below) stops in air. We could not find any fancy stuff in the logs before the log stops. Pixhawk rebooted in-air, and continued logging as usual (part2, see below).

Hard fault 2: On-ground

This happened during in-lab testing. We performed extensive pre-flight checks, including rebooting FMU (by pressing the reset button) and checking that IO's manual override still works. It did, but (and i assume this is normal because there is already a mixer present) loading the mixer after that provoked reboot fails. We let the plane then just sit there for a couple of minutes, and suddenly Pixhawk rebooted. This was a hard fault, too. We unfortunately assumed this was BECAUSE we had provoked the in-air restart via pushing the FMU reset button (and after testing for another 8 hours in-lab without any hard fault assumed everything was OK). I don't have the ulog-files yet, but i have:

  • The hardfault log from the SD card (it looks very similar to the one we saw in flight):
    fault_2000_01_01_18_53_35.log
  • The console output. Note again that this log starts after we provoked an FMU reset via pushing its reset button, that's why it is an in-air restart.
    consoleLog.txt

I might be able to provide the firmware .elf file to allow debugging the hard fault as described in https://dev.px4.io/en/debug/gdb_debugging.html#debugging-hard-faults-in-nuttx if required.

I guess it makes sense to add this to #9271 @RomanBapst @LorenzMeier ?

@dagar
Copy link
Member

dagar commented Apr 21, 2018

The hardfault occurred in the high priority work queue (HPWORK).

Things that would likely be running in HPWORK on your setup.

  • differential pressure sensor drivers
  • the px4fmu driver (handles AUX output mixing)
  • magnetometer drivers
  • land detector
  • anything custom running in HPWORK?

Are you using the AUX outputs on the pixhawk?

Providing the elf that corresponds with the inflight hardfault (https://review.px4.io/plot_app?log=3b702655-6052-46e6-85e2-fd012d27c25f) might help.

@philipoe
Copy link
Contributor Author

philipoe commented Apr 21, 2018

differential pressure sensor drivers

Stock, i.e. SDP3x

the px4fmu driver (handles AUX output mixing)

No, we are not using AUX channels.

magnetometer drivers

Stock ADIS16448 driver.

land detector

Yes, we're running that.

anything custom running in HPWORK?

Yes, our battery monitoring driver. But that is obviously quite a low-level driver. We'll double-check the code.

I should also mention that after having this hard fault on the ground, we continued testing and the system ran for 8+ hours without an issue.

@LorenzMeier
Copy link
Member

Potentially related (although we didn't get a hardfault log): #9260

@LorenzMeier
Copy link
Member

@philipoe Could you please provide the ELF? That's really important at this stage.

@philipoe
Copy link
Contributor Author

philipoe commented Apr 21, 2018

@dagar
Copy link
Member

dagar commented Apr 21, 2018

I explored both hardfault logs a bit, but didn't find anything definitive.

In both cases the program counter is 0x52525252 (bogus), as well as several other registers, and various locations in the stack.

HPWORK looks to be in your custom code (/home/philipp/src/px4/Firmware/build/px4fmu-v3_asl/../../src/drivers/bat_mon/bat_mon.cpp:354), and one of the last recognizable things I came across in the stack was the end of an i2c transfer.

@philipoe
Copy link
Contributor Author

Thanks a lot! @ASM3 Any idea? Could you look into this?

@Antiheavy
Copy link
Contributor

probably not related, but here is a link to some other hard fault issues just in case: #8913

@LorenzMeier
Copy link
Member

The hardfault log you sent me indicates a bad PC and corruption on the user stack. As I look up the stack for possible calling code I see 0x0801d9ff in sem_timeout possible called by 0x080f5ef5 in Bq78350::~Bq78350()

@philipoe This is a custom, non-contributed driver and there is no way for us to tell if that's at fault nor can we debug it. If you want support for that peripheral and debugging in situations like this one you would need to make the driver and hardware available (= commercially available or as open hardware). That is a general rule for upstream debugging.

I'm closing the issue with a tentative conclusion that the fault is due to user-changed code and not an actual PX4 issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants