Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Zephyr vs. RIOT-OS #32875

Closed
luizvilla opened this issue Mar 4, 2021 · 21 comments
Closed

Benchmarking Zephyr vs. RIOT-OS #32875

luizvilla opened this issue Mar 4, 2021 · 21 comments

Comments

@luizvilla
Copy link

Hello everyone,

I'm Luiz Villa, a researcher at the University of Toulouse. I'm working with software defined power converters and my team is looking into using an RTOS to manage real-time micro-controllers.
We ran a benchmark between Zephyr and RIOT OS that we would like to share with you.
Spoiler: Zephyr performed poorly.
Here's the summary:

image
Our test code is available at https://gitlab.laas.fr/owntech/zephyr/-/tree/test_adc_g4

Our target : Nucleo-G474RE

Our main conclusions:

  • Zephyr has an overhead of almost 27us compared with 10us from RIOT
  • Zephyr consumes 34% of CPU cycles compared with 14% from RIOT
  • Zephyr has a period of 12us when we flood the shell compared with 1us from RIOT
  • Zephyr has a minimum thread period with an active shell of 38us compared with 40us from RIOT

Our questions for the community:

  • Does anyone has the same issues?
  • Have you managed to bring the execution time of Zephyr below this threshold?
  • Are we missing something on our code?

Originally posted by @luizvilla in #32870

@gmarull
Copy link
Member

gmarull commented Mar 4, 2021

Interesting results, thanks for sharing @luizvilla!

Note that In the adc example, I'd use a hardware timer to trigger the ADC in a real application (it is possible on STM32). Regarding footprint sizes, it would be nice to compare rom reports in both cases to see the big difference, in particular for ADC and shell. Have you published any paper with some more insights?

@gmarull
Copy link
Member

gmarull commented Mar 4, 2021

@luizvilla I've cloned your project, and I've seen a few things that can be disabled or improved. A few things: use minimal lib C (if possible by the application), disable some shell features, reduce some stack sizes, etc. Without testing on a real board I quickly got ~37K ROM and ~9K RAM by doing a few tweaks, so I'm sure there is room for improving the footprint at least.

@erwango
Copy link
Member

erwango commented Mar 4, 2021

@luizvilla As mentionned by @gmarull, the first thing to do would be to understand the configuration used for the zephyr target. I guess this can be minimized in various ways, but point is to stay on par with RIOT configuration.

For instance, on test_adc_g4, we can disable LOG, I2C, tweak stacks sizes, use minimal libc, and quickly divide footprint by 2.
Though, doing this maybe we remove features provided by the RIOT firmware.
Can you elaborate on the configuration options you selected (basically all the 'y' features in prj.cnf) ?

EDIT: Reached down to 17K/5K on test_adc_g4 by deactivating LOG, SHELL and few others...

@carlescufi
Copy link
Member

Can you elaborate on the configuration options you selected (basically all the 'y' features in prj.cnf) ?

You can see them here. As @gmarull pointed out, the set selected here is certainly not optimized in any way.
https://gitlab.laas.fr/owntech/zephyr/-/blob/test_adc_g4/zephyr/prj.conf

@erwango
Copy link
Member

erwango commented Mar 4, 2021

You can see them here. As @gmarull pointed out, the set selected here is certainly not optimized in any way.

I've seen that. I just want to know if this configuration was made on purpose, to be on par with RIOT configuration.

@erwango
Copy link
Member

erwango commented Mar 4, 2021

@luizvilla Also, the project documentation to get the things build is great. Though, I haven't found info on how the latency measurements were made. Can you elaborate on this part ?

@carlescufi
Copy link
Member

I noticed that you are defining the trigger thread's priority to 6. Have you tried making this a high-prio cooperative thread instead?

@luizvilla
Copy link
Author

luizvilla commented Mar 4, 2021

Interesting results, thanks for sharing @luizvilla!

Note that In the adc example, I'd use a hardware timer to trigger the ADC in a real application (it is possible on STM32). Regarding footprint sizes, it would be nice to compare rom reports in both cases to see the big difference, in particular for ADC and shell. Have you published any paper with some more insights?

Thank you for your quick return!
So far we have not published this. We came to you to be sure we are not forgetting something.

You can see them here. As @gmarull pointed out, the set selected here is certainly not optimized in any way.

I've seen that. I just want to know if this configuration was made on purpose, to be on par with RIOT configuration.

The idea was to create a benchmark between both systems. So we did not try to optimize the code, but rather test equivalent functionality and performance for these functitonalities.

@luizvilla Also, the project documentation to get the things build is great. Though, I haven't found info on how the latency measurements were made. Can you elaborate on this part ?

We toggle a pin and measure the time between toggles with different ranges of sleep from 0us to 10us.

I noticed that you are defining the trigger thread's priority to 6. Have you tried making this a high-prio cooperative thread instead?

Not really. We wanted to test if the threads had similar performances on comparable conditions.

We are proofreading an internal report we made and we'll share with you as soon as we can.

@carlescufi
Copy link
Member

The idea was to create a benchmark between both systems. So we did not try to optimize the code, but rather test equivalent functionality and performance for these functitonalities.

Sure, it's not about optimizing the code, but rather about defaults. If you want an apples to apples comparison you should enable the same features on both RTOS in my opinion. If it turns out that one of the two enables more functionality by default than the other one, then you should probably disable it in order to come to a meaningful comparison, unless you want to compare "default configurations".

I suggest you use west build -t rom_report to get a better idea of where your flash code is coming from. Same goes to RAM with west build -t ram_report.

@gmarull
Copy link
Member

gmarull commented Mar 4, 2021

Agree with @carlescufi, a meaningful comparison should enable a comparable feature set on both sides.

@luizvilla
Copy link
Author

Thanks, everyone, for your valuable input.
Just to clarify, there are two main issues at hand:

  1. The footprint - Being sure that the same amount of features are active on both RTOS as to have a clear comparison metric of their footprint. I'll get back to you on this as soon as I talk to our software engineer.
  2. The performance - The systems had only their adc acquisitions active within a single repeating thread which had a variable sleep length. This test was conducted with and without the shell. This is, I think, independent from the system's footprint and the results are comparable.

We'll get back to you with more details. But I already thank you a lot for your reactivity and interest.

@jharris-intel
Copy link
Contributor

Also, note the documentation for k_usleep:

/*
 * This function is unlikely to work as expected without kernel tuning.
 * In particular, because the lower bound on the duration of a sleep is
 * the duration of a tick, @option{CONFIG_SYS_CLOCK_TICKS_PER_SEC} must be
 * adjusted to achieve the resolution desired. The implications of doing
 * this must be understood before attempting to use k_usleep(). Use with
 * caution.
 */

Zephyr's k_usleep ~always does a "true" sleep, whereas it appears from the documentation (I'm not familiar with RIOT-OS) that
xtimer_usleep will busy-wait for small periods.

For small periods xtimer_usleep would then be more comparable to k_busy_wait.

(Also, is Zephyr running in tickless mode? Is RIOT-OS? Are you running Zephyr with CONFIG_USERSPACE? Is the Zephyr thread preemptive? Is the RIOT-OS thread preemptive?)

@andyross
Copy link
Contributor

andyross commented Mar 5, 2021

Yeah, looking at the code here, it seems like indeed the main loop is just asking for too much: https://gitlab.laas.fr/owntech/zephyr/-/blob/test_adc_g4/src/adc_g4/adc.c#L180

Zephyr's sleep calls are blocking primitives, they will register a timer interrupt for that duration in the future, pend the current thread, and then context switch to another thread (likely the idle thread in this app) to do other work while it waits. Trying to do that at > 30 kHz on a CPU like this isn't going to work well, your app is likely spending most of its time in context switch and interrupt handling. (It's also trying to do that more than twice as often as under RIOT).

Similarly, I see you're asking for a tick rate of 1 MHz. That's not impossible to do as long as you configure 64 bit ticks, but in some sense that's false precision (on a 170 MHz CPU, it's going to take longer than 1us just to enter and exit the interrupt handler).

Can you post the build/zephyr/.config file from your application? There is no doubt a ton of tuning we could suggest.

@andyross
Copy link
Contributor

andyross commented Mar 5, 2021

OK, having studied the source to the test (and a tiny bit of digging in RIOT-OS), some more notes:

  1. As mentioned, the resulting .config file from your build would be helpful.

  2. Are you sure your performance deltas aren't just measurement error? At least according to the source at the HEAD commit, you're waking the RIOT-OS up every 30us, but the Zephyr thread every 13us. So Zephyr is being asked to do 130% more work?

  3. As mentioned by Carles upthread, the usage pattern isn't really right for what Zephyr expects. You have your ADC polling thread configured at a low priority, so it's always going to be preempted by the shell thread. If you want reliable polling in timers like this, you want the timers set at a high priority so the OS services the hardware and not the interactive/bursty shell thread.

  4. This explains why "flooding the shell" (if I understand your phrasing correctly) locks out the ADC handling. Zephyr has a real time scheduler, that is correct behavior. You told the OS explicitly not to run the ADC thread if the shell had work to do!

  5. (That does imply that the shell can't handle a 115kbps UART stream without saturating a 170 MHz Cortex M, which does indeed seem like a performance bug. But I guess it depends on what commands you're trying to run with the flood of bytes.)

  6. Crude sleep polling like this frankly isn't what we optimize for. If you really have hard real time polling requirements like ADC sampling and you want to run them off of the CPU SysTick timer (and not a hardware trigger, or external counter device, etc...) then the Right Thing in Zephyr is to use a k_timer, which allows you to hook the timer ISR directly with a repeating interrupt that is hardened against things like delayed interrupts and won't glitch or skip except when the interrupt gets masked out or preempted.

  7. Deliberately cranking the sleep timer past the interrupt handling rate is REALLY not the right thing to do, on either OS. It looks like you arrived at the values you have not because that's the frequency you want but because those are the smallest values you could set before something broke. This isn't likely going to be testing what you think it's testing; you're probing the edge cases and not the desired operating regime. Honestly: I think Zephyr comports itself fairly well here, given that you can give it more pathological input than RIOT-OS and still get useful behavior.

Broadly, I guess what I would hope to see here would be (given the ADC-sampling app), a comparison of the maximum sampling rate achievable under each OS under various IPC paradigms (sleeping, timer hooks, busy-waiting, etc...), and a statistical measure of each OS's variance (i.e. how regular are the sampling times).

But the even shorter summary is that you really want to be using k_timer for this use case. That's what it's designed for.

@andyross
Copy link
Contributor

andyross commented Mar 5, 2021

I will say though, that in RIOT-OS's defense their ARMv7 context switch code is a masterpiece of simplicity when compared with ours. It saves to the stack! It's local, no PendSV! No CONFIG_SWAP_NONATOMIC! It would make an excellent starting point for an arch_switch() if anyone wanted to try. I don't know that it's necessarily faster than our context switch code, but it's definitely cleaner.

@luizvilla
Copy link
Author

Thank you everyone for your returns! This has been super useful!
We'll rework our application, simplify our Zephyr and re-think our scheduling.
We'll get back to you if we have some more interesting stuff.
Thanks again!

@rahav
Copy link

rahav commented Jun 25, 2021

Thanks @luizvilla! this is very interesting. Did you consider Contiki?

@jhaand
Copy link

jhaand commented Dec 31, 2021

I would suggest ADC reads using DMA to check the performance and ease of programming. Sampling a block at 48 kHz and then transferring it to the host while keeping up with all the work. Through how many hoops do you have to jump, to get this going with either Operating System.

That would look at a real use case, while only using the shell before and after the measurements. You could also look at power management/drain for these use cases.

@Fladdan
Copy link
Contributor

Fladdan commented Oct 10, 2022

Thanks @luizvilla to share this topic!

Could you share the used code on Github? It seems the LAAS GitLab is not open to public access anymore.
I'm curious to do tests with an STM32 and Zephyr and your benchmark could be a good starting point.

@Skyrov01
Copy link

Skyrov01 commented Jan 4, 2024

Hi, there @luizvilla! This was an interesting topic to read. But by any chance could you make the code public or maybe shared it somewhere else? I would really appreciate it. Thanks!

@ndrs-pst
Copy link
Contributor

Even though this discussion has been closed for more than 3 years, I really enjoyed the content 😄.
For what it’s worth, at least we know that OwnTech Foundation decided to use Zephyr in this: https:/owntech-foundation/Core. 🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests