Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lengthy interrupt disables #524

Closed
terrillmoore opened this issue Jan 11, 2020 · 7 comments · Fixed by #571
Closed

Remove lengthy interrupt disables #524

terrillmoore opened this issue Jan 11, 2020 · 7 comments · Fixed by #571
Assignees
Labels

Comments

@terrillmoore
Copy link
Member

terrillmoore commented Jan 11, 2020

On some BSPs "interrupts disabled" stop the clock from advancing properly. hal_waitUntil() hangs in os_init() because time is not advancing. See #523 and #521.

Various BSPs handle time differently when interrupts are disabled, and start up with different interrupt enabled/disabled state. The MCCI BSPs all have the property that time advances even when interrupts are disabled. But many BSPs do not.

The LMIC needs to be reworked to remove all the interrupt enable/disable logic that's presently there, as I am certain that there's no need for extended interrupt disables. Thanks @aparcar and @JackGruber for finding the root cause.

@czuvich
Copy link

czuvich commented Apr 15, 2020

@altishchenko Just curious if are taking over this repository? This particular issue breaks quite a few boards, and I think @terrillmoore was going to work on a fix. I've got a handful of issues I've reported that are related to this issue as well that prevent using OTAA and ADR. I'm happy to work with you for a fix.

@altishchenko
Copy link

@czuvich Clay, I didn't really think about that, I am new to this community and, I am afraid, I won't be able to give a repository of such popularity required amount of attention in the long run. I don't know why @terrillmoore is not with us for some time now, probably busy on another more important project.
It just happens so that we are integrating this library as a replacement for the extremely heavyweight STM/Semtech protocol implementation we use at the moment and code economy of 15K+ is not something you can neglect in an MCU.
I am keen on fixing as many issues as I stumble upon, including bugs and architectural deficiencies, and I am more than happy to share my solutions. If you like, we can switch over to e-mail in order not to bloat the problem reporting cases with discussions (drop me a note at [email protected], if you would).

As for the list of most important problems, I will summarize them as questions, because for some of them I don't have good convincing (me) reasons:

  1. Why enabling interrupts (and I can't disable them for other reasons) prevents OTAA joins at SF7/DR5, but works pretty good with SF12? - SF7 works ok with interrupts disabled as confirmed by @jpmeijers in Library gets stuck in OPMODE 888 #547.
  2. Why hal_waitUntil() function uses absolute time, as 90% of calls to it follow the same pattern of os_getTime() + delta? And what hal_waitUntil() does is - it calculates delta back and delays on it - two unnecessary calls to os_getTime() and delta arithmetic.
    2* Is there a situation (really) that hal_waitUntil() gets time in the past to wait for?
    2** What are the particular cases, where hal_waitUntil() does its intended function - waits until a determined moment in time and can they be converted to deltas? (4 places, all of them TX/RX window positioning).
  3. Is there a better way to provide generic, platform neutral delay based on CPU frequency, like many critical functions do, instead of interrupt-counter dependable RTE functions? (I have one for our project, but it is not at all generic and pretty much Cortex based).
  4. Where in the code disable/enable interrupts are really warranted?
  5. Can the race condition in os_runloop_once() be really solved by a simple flag?
  6. Does the function executed in os_runloop_once() require interrupts to be enabled?
  7. Does the code really really need microsecond precision for the whole of the library or may be changing the scale to 1ms and keeping microseconds only where they are definitely required will improve the complexity? (see 2)
  8. What are the low power possibilities out there in Xduino world, so the library code can be tailored to suite them? - I have low power with sleep and stop/standby, but my implementation is very "project"-specific. Providing generic solution to HAL API requires generic knowledge of the devices used mostly, which I lack.

These questions are biased towards our project and do not reflect 90% of standard use cases. Like, we don't need packaged functionality to schedule arbitrary timed jobs, we have it in another place, so the jobs that run in os_runloop_once() are only the ones scheduled by the library itself.

For the whole, I have it running pretty stable now - nearly a week without reset or rejoin, low power works fine and does not affect the data exchange, ADR works, OTAA works, race lockups or resets didn't happen for a good while, it does not now affect performance of the other project components. But again, all of this is very project specific, I just don't have these Arduino boards to try out the examples :)
Let's talk by e-mail!

Cheers,
Alex

@czuvich
Copy link

czuvich commented Apr 16, 2020

@altishchenko Wow this is incredible feedback! You've put in way more effort than I have... I ended up punting to ABP without ADR since my use case is fairly simple and being prototyped. In terms of response to those questions, I really only have a little... and yes I'd love to connect with you via email. I'll be sending it from [email protected].

What are the low power possibilities out there in Xduino world,

LMIC from what I could tell does a good job keeping the radio asleep when not in use (I read 1.6uA on my custom board using nRF52840).

To add onto that, I would like to see an example that saves off all of the necessary information which allows you to power cycle the mCU without doing a rejoin (perhaps this falls under low power possibilities you mentioned earlier). I've scoured the forums for a good solution, but it just gets bounced around. My mCU power cycles after a deep sleep wake-up which is another reason why I use ABP. Anyways... I'll shoot an email to you and maybe we can at a minimum collaborate our findings. Again.. thanks for the very detailed explanations!

@JackGruber
Copy link

Hi @czuvich
Send my ESP32 to DeepSleep and wake up without rejoin.
https://jackgruber.github.io/2020-04-13-ESP32-DeepSleep-and-LoraWAN-OTAA-join/
its not perfect, but works for me.

@czuvich
Copy link

czuvich commented Apr 17, 2020

@JackGruber Thank you! This is perfect.

@altishchenko
Copy link

@JackGruber Nice article. As for the note at the end of your article concerning micros() - this is one my concerns too, which I outlined in question 7 above. The MCU I work with has real time calendar module, (and ESP32 has one too), but its precision is milliseconds. The same story applies to Zero line of Arduinos, which are based on SAMD21. Fortunately for my case (STM32) and for the Zero line (SAMD21), SLEEPDEEP does not perform Reset on exit and RAM contents is preserved in power down, comparing to ESP32 which loses CPU context unless you are running from high speed RTC RAM. Duty cycles are definitely millisecond/seconds precision, so they can be managed by RTC calendar clock base. The problem stems from the fact that the same clock base in the library is used to calculate receive window timing and keep track of regular jobs lengthy jobs. This is my main concern at the moment as I am trying to decouple TOA and RX window calculation from usual house keeping which can be RTC-millisecond based.

@altishchenko
Copy link

@JackGruber Forgot to mention. One of the quick solutions I used was to have a flag that gets set in os_setCallback/os_setTimedCallback pair and gets reset in os_runloop_once() in place of hal_sleep(), it is taken into account when going low power decision is made. TX duty cycles are managed outside LMIC in my case, as soon as I succeed with LMIC+RTC, I will let LMIC manage them.

terrillmoore added a commit that referenced this issue May 12, 2020
The primary purpose of this PR is to fix #524, allowing the LMIC to run without disabling interrupts at all, and without requiring any changes to underlying BSPs. When configured for interrupts, interrupts simply cause the current time stamp to be captured. The secondary ISR is run as part of os_runloop_once(). This should also fix #503, and address #528, #558, #548, and others.

In addition, since we're updating the radio driver, I addressed #524.

In testing, I discovered a subtle bug with one of our applications that uses LMIC_sendAlive() -- there was a path when sending piggybacked MAC data with a poll that would result in trying to take the port 0 path instead. This caused problems with ChirpStack and TTN, at least. (This is #570.)

Finally, updated to use Arduino IDE 1.8.12 in test.

Version of library changes to v3.1.0.20.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants