-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers: systick: fix cycle count in sys_clock_set_timeout() #35062
Conversation
Mitigates #35073. |
b3da425
to
0c6f6fa
Compare
0c6f6fa
to
fe8461f
Compare
With this patch the sys_clock_set_timeout function counts the cycles elapsed while computing the systick timer's new load (tickless mode). This cycles are then added to the total cycle count instead of being lost. This patch mitigates uptime drifting in tickless mode (especially when high frequency timers are registered). Signed-off-by: Simon Guinot <[email protected]>
fe8461f
to
50962dd
Compare
I fixed few typos in the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little wrong to me. Clock interrupts should always be scheduled to expire at tick boundaries. This patch has the effect of correcting the cycle_count but does nothing to re-align the tick stride. So you'll have a better approximation to real time in k_cycle_get_32() but the timing of the ticks delivered to timeout clients won't be corrected and they'll get out of stride with each other, no?
Also: if we're going to dive in and add complexity to a simple(-ish) driver to hyperoptimize, let's get it right and not do a half measure: Don't add a new load of the counter to read val1, elapsed() has already done that. Find a way to export the counter value done there, as close to the top of the function as possible.
Also, this seems needlessly complicated. Fundamentally what the math wants to do is set a timer (after some modular fixup work) for "+/- N cycles offset from the expiration of the last-scheduled interrupt". So really what we want for this problem is I think a structure that looks more like:
int32_t last_load_offset = /* complicated modular trickery here... */;
bool ok = true;
do {
/* Read/set in one minimal expression */
SysTick->LOAD = last_load_offset + SysTick->VAL;
ok = ???; /* check that the computed value didn't wrap, etc... */
} while(!ok);
This shrinks that window down to just the two register accesses and an addition, plus some after-the-fact cleanup which doesn't affect latency. For extra credit, you can implement that line in assembly and include a platform-specifiable cycle-count fixup to recover even those few cycles.
Soft -1. I'm the original author of this driver but not the maintainer or really even an ARM person at all. It doesn't look logically incorrect, so I'm OK with an override if someone else has strong feelings.
Hi @andyross, Thanks for reviewing this patch and thinking about the issue. Before looking into it, there is something I want to say: the drifting is not small. The driver is bleeding cycles. I discovered this issue while working on the synchronization of LED effects between several devices. The running application is quite simple and basically uses a couple of timers at 100 Hz to update some LEDs. Also an accurate external oscillator (24 MHz frequency with less than 150 PPM errors) is embedded on this devices. The drifting (against a reliable master clock) I observed is 30 ms per 10 seconds. So it is not small and it is definitively not good for a RTOS. One way or another, we need to fix this drifting :)
Yes, this patch only fixes the cycle count and then the system uptime. This "lost" cycles are not taken into account while configuring the timer. So the next timer interrupt will be late of this cycles. But the "lost" cycles will be either announced (if there are enough to make a tick) or most likely reported through elapsed(). This means they are counted in "now", which makes the registration of new timeouts more accurate. So the "lost" cycles are not adding up one interrupt after the other and although the tick alignment is not perfect, it is working. I agree it would be better to have a more accurate LOAD when programming the interrupt, but I have been unable to find out a way...
In a first version of this patch, I was retrieving "val2" used in elapsed() this way:
As this code relies on the internal behavior of elapsed() (i.e. the implicit update of overflow_cyc), I was not happy with it and I moved to the current version. But it was more accurate.
Are you suggesting to only configure LOAD in sys_clock_set_timeout() ? Without resetting the counter to LOAD (by writting 0 into val) ? If that's the case then I don't think it can work. Or I am not understanding correctly your proposal. Our problem is that the counter automatically wrapped to the last programmed LOAD value. So when we enter in sys_clock_set_timeout(), we have an already running/decrementing counter from a LOAD value which is not relevant for the timeout we want to program. And in some situations (timeout < LOAD), we need to shrink it. We can't let the counter go to 0 but we need to reset the counter to a new load. And to compute this new load value, we must take care of the elapsed cycles, of the tick alignment, of the min and max values, etc, which is pretty much what the current code does. While there is surely some room to optimize the current code, I don't think we can make it a single line. Please fix me if I got your proposal wrong :)
As I said earlier, we have to fix it. I am ready to try to solve the problem differently. So let's see where we can go. But at the end we have to come up with something. |
Think of it like this: Imagine a hypothetical device with a 10 kHz tick rate and a 10 MHz cycle clock. So there are 1000 cycles in a tick. We know that the last timeout was set for a known tick expiration. Call it "tick 432" (I know the driver doesn't see absolute tick numbers, but this makes the explanation clearer). Now you get a new sys_clock_set_timeout() call that wants to set the next timeout for two ticks later than that instead, at tick 434. So, the time that we now want the timer to expire is going to be 2000 ticks after the time the current timer is going to expire. How long is it until the current timer expires? It's just the VAL register, by definition! So in our example, this is all we need to do: SysTick->LOAD = SysTick->VAL + 2000; And that's almost perfect, with the two register accesses separated by no more than 2-3 instructions[1]. This is the essence of the scheme that we want, I think. But there are edge cases:
So in practice, the code ends up more complicated than the 1-liner above. But becuase there was no precision loss in that negotiation, you can just fix things up and repeat until you get a setting that works. [1] And as mentioned, by writing this in assembly and hand-counting cycles, you can correct for that loss deterministically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it turns out that this fixes #32839
So while I still mildly hate the technique, aesthetic concerns have to take a back seat to bug fixes on real hardware. We can come back later and improve the code at our leisure I guess.
@simonguinot Verified your PR on stm32g4: With you PR
Without your PR
Test Code: /*
* Copyright (c) 2020 Alexander Wachter
*
* SPDX-License-Identifier: Apache-2.0
*/
#include <zephyr.h>
#include <soc.h>
#include <device.h>
#include <devicetree.h>
#include <irq.h>
#include <stm32_ll_rcc.h>
#include <stm32_ll_tim.h>
#include <drivers/clock_control/stm32_clock_control.h>
#include <drivers/clock_control.h>
#include <logging/log.h>
LOG_MODULE_REGISTER(main);
static uint32_t timer_cnt;
void timer_isr(void* unused)
{
if(LL_TIM_IsActiveFlag_UPDATE(TIM2)==SET) //Overflow interrupt
{
const uint32_t uptime = k_uptime_get_32();
LOG_INF("Timer ISR number %d Uptime: %d",timer_cnt, uptime);
timer_cnt += 1;
}
LL_TIM_ClearFlag_UPDATE(TIM2); //Clear interrupt flag
}
void sys_timer_isr(struct k_timer* unused)
{
const uint32_t uptime = k_uptime_get_32();
uint32_t time = (uint32_t)(((uint64_t)timer_cnt * 1000000ULL + (uint64_t)LL_TIM_GetCounter(TIM2)) / 1000ULL);
LOG_INF("Sys Timer ISR. Timer: %d Uptime: %d", time ,uptime);
}
void main(void)
{
const uint32_t tim_clk = DT_PROP(DT_NODELABEL(rcc), clock_frequency);
LL_TIM_InitTypeDef init = {
.Prescaler = __LL_TIM_CALC_PSC(tim_clk, 1000000U),
.CounterMode = LL_TIM_COUNTERMODE_UP,
.Autoreload = 1000000U,
.ClockDivision = LL_TIM_CLOCKDIVISION_DIV1
};
const struct stm32_pclken pclken = {
.enr = DT_CLOCKS_CELL(DT_NODELABEL(timers2), bits),
.bus = DT_CLOCKS_CELL(DT_NODELABEL(timers2), bus),
};
const struct device *clock = DEVICE_DT_GET(STM32_CLOCK_CONTROL_NODE);
struct k_timer timer;
int ret;
ret = clock_control_on(clock, (clock_control_subsys_t *)&pclken);
if (ret < 0) {
LOG_ERR("Could not initialize clock [%d]", ret);
return;
}
if (LL_TIM_Init(TIM2, &init) != SUCCESS) {
LOG_ERR("Timer Init failed");
}
IRQ_CONNECT(DT_IRQN(DT_NODELABEL(timers2)),
DT_IRQ(DT_NODELABEL(timers2), priority),
timer_isr, NULL, 0);
LL_TIM_EnableIT_UPDATE(TIM2);
LL_TIM_EnableARRPreload(TIM2);
LL_TIM_EnableCounter(TIM2);
LOG_INF("Init done");
irq_enable(DT_IRQN(DT_NODELABEL(timers2)));
k_timer_init(&timer, sys_timer_isr, NULL);
k_timer_start(&timer, K_NO_WAIT, K_MSEC(1000));
while(1) {
k_msleep(10);
}
} |
I think it's not possible to fix finally. |
OK, I get this part. But what's happening if we need to set a timeout earlier, let's say "tick 431" instead of "tick 432" in your example ? Is that an impossible case ? |
@alexanderwachter Thanks for testing. Note that the uptime drifting is more important when the system is running high frequency periodic timers (> 100 Hz). |
2000 was computed as "(434 - 432) * 1000". You can easily do "(431 - 432) * 1000" to get an offset of "-1000" too. :) You're right that my pseudocode was wrong because I forgot that a write to LOAD doesn't reset VAL and you need to do that with a separate instruction (again, SysTick is just not a good interface vs. things like HPET, RISC-V or TSC-deadline which are comparators against 64 bit free-running counters that you don't have to race around to reset). But negative offsets work just fine. They don't work always (because of rollover and minimum interrupt delay problems), but that's OK because having set the counter (with zero precision loss) you can then take your time and see what you've done and reset the counter a second time while e.g. correcting a 24-bit rollover or pushing the expiration out by a tick, etc... |
Hi, I wrote an almost drift free version. I doubt I still miss some edge cases, because I removed a lot of code from the original driver ;-)
|
Here are the logs I get with your reworked driver by using the application program I attached to #35073:
So after 10 minutes running a 1000 Hz periodic "no op" timer the drifting is around 1 second (as with this PR), against 4.5 seconds with the original driver. But note that the uptime reported from the 10 second periodic timer doesn't look "stable", while it is with the original driver. Are you going to try to implement the proposal of @andyross ? IMHO it is worth trying. |
Got it. You lost me in a first place because of the missing line :) I was thinking you figured a way to avoid a reset... If it is possible (and I think it is) to compute a valid offset (regarding the min an max counter values) without reading the counter (VAL), then your method is the cleanest way to go. Moreover I like the idea of recovering the last few cycles via a Kconfig "platform" option, or even by counting them at run time (with some calibration performed during the initialization). |
* don't need to worry about this case. | ||
*/ | ||
if (val1 < val2) { | ||
cycle_count += (val1 + (last_load_ - val2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being very late here. I'm analyzing code of the SysTick driver and can't get a few things. Thanks in advance if you have chance to answer.
This line assumes a wrap is possible during the calculation. It means the ISR (sys_clock_isr
) is called.
The elapsed
function (called by ISR) detects there was a wrap and adds last_load
to the overflow_cyc
variable, which is added to the cycle_count
. Doesn't it mean the the last_load
is added to the cycle_count
two times? The first time in the ISR and the second time in sys_clock_set_timeout
function - first part last_load - (pending - overflow_cyc)
and the second part here, with val1
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @semihalf-niedzwiecki-dawid,
sys_clock_isr()
and elapsed()
won't detect the overflow because writing 0 in the SysTick Current Value Register from sys_clock_set_timeout()
clears the COUNTFLAG
status bit to 0:
SysTick->VAL = 0; /* resets timer to last_load */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the response :-)
But what happens if the overflow occurs between val1 = SysTick->VAL;
and val2 = SysTick->VAL;
? Isn't last_load
added to overflow_cyc
and val2 > val1
so the cycle_count += (val1 + (last_load_ - val2));
is executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is not. sys_clock_set_timeout()
computes the new timer load with interrupts disabled. See the call to k_spin_lock()
. So in case of overflow sys_clock_isr()
will be called right after sys_clock_set_timeout()
returned and will deal with the updated timer values and a COUNTFLAG
status bit set to 0. There is a comment explaining that in sys_clock_isr()
.
At the time I wrote this patch I tested this code a lot. Are you experimenting an issue ? Note that this driver is still leaking few cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time I wrote this patch I tested this code a lot. Are you experimenting an issue ?
No, I'm just trying to fully understand the code, I haven't encountered the described case.
No it is not. sys_clock_set_timeout() computes the new timer load with interrupts disabled. See the call to k_spin_lock(). So in case of overflow sys_clock_isr() will be called right after sys_clock_set_timeout() returned and will deal with the updated timer values and a COUNTFLAG status bit set to 0.
Thanks a lot for the explanation. I thought for some reason that the SysTick IRQ is not disabled by the arch_irq_lock
call (as SVC), it was my mistake. Now everything is clear.
Note that this driver is still leaking few cycles.
I guess ticks between val2 = SysTick->VAL;
and SysTick->VAL = 0;
are missed. It looks like the run time measurements and correcting cycle_count
would make sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess ticks between
val2 = SysTick->VAL;
andSysTick->VAL = 0;
are missed. It looks like the run time measurements and correctingcycle_count
would make sense here.
Yes. There are also a few ticks lost by hardware when reloading and resetting the timer. They all add up and make a drift which can become "not negligible" especially when high frequency timers are used by the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation again.
With this patch the sys_clock_set_timeout function counts the cycles
elapsed while computing the systick timer's new load (tickless mode).
This cycles are then added to the total cycle count instead of being
lost.
This patch mitigates uptime drifting in tickless mode (especially when
high frequency timers are registered).
Signed-off-by: Simon Guinot [email protected]