k_sleep duration is off by 1 tick #32499

micwurm · 2021-02-19T17:56:40Z

I did some tests for a project that required precise timing and I noticed that k_sleep sleeps one tick too long when using a relative timeout, and one tick too short when using an absolute timeout.
Test code:

	targetTicks = k_uptime_ticks() + 2;
	while (k_uptime_ticks() < targetTicks);
	// we are at the beginning of a tick period now
	GPIO_PinOutToggle(PIN_DEBUG);
	targetTicks += 4;
	retVal = k_sleep(K_TICKS(4));	// case 1
	//retVal = k_sleep(K_TIMEOUT_ABS_TICKS(targetTicks));	// case 2
	GPIO_PinOutToggle(PIN_DEBUG);
	endTicks = k_uptime_ticks();

Observed behavior:

k_sleep(K_TICKS(4));
endTicks == targetTicks + 1
retVal == 0

k_sleep(K_TIMEOUT_ABS_TICKS(targetTicks));
endTicks == targetTicks - 1
retVal == -1717986927

Looking at the debug pin with a scope confirms this and I see a 520us pulse in case 1, and a 320us pulse in case 2.

The problem seems to come from the following line in z_add_timeout():

	k_ticks_t ticks = timeout.ticks + 1;

My setup is a efm32gg_stk3701a board (Cortex-M4) with 10kHz ticks, 50MHz clock, no time slicing, 64-bit timeouts enabled.

The text was updated successfully, but these errors were encountered:

jharris-intel · 2021-02-19T18:12:15Z

k_ticks_t ticks = timeout.ticks + 1;

This may be because of the following:

If you have, say, a 1kHz clock.

And you wait for 1 tick to pass by immediately before a tick. This will wait ~0ms, plus RTOS overhead.
And you wait for 1 tick to pass by immediately after a tick. This will wait ~1ms, plus RTOS overhead.

Or rather:

Without +1:
Actual wait time is between ticks-1 and ticks, plus RTOS overhead.

With +1:
Actual wait time is between ticks and ticks+1, plus RTOS overhead.

Generally (not always), it's better to wait too long than too short. (And with RTOS overhead there's no guarantee that you won't oversleep anyways.)

andyross · 2021-02-19T18:22:10Z

This looks correct. The documented behavior of a relative timeout is that it sleeps for at least that long. So in your relative example you wait for tick alignment, sleep for four ticks, and then receive your interrupt at the next tick that arrives after that, which is five ticks after the tick that arrived immediately before the sleep call. We can't wake you up in exactly four ticks without either:

Sometimes breaking the promise that you'll never sleep for less than four ticks.
Delivering a timeout interrupt on something other than a tick boundary

The absolute case gets this right, though. If you call k_uptime_ticks() immediately after that sleep returns, you should find that it produces exactly the uptime you asked for. In fact the ability to reason about questions like this more precisely (without worrying about things like aligned tick interrupts or round-up padding) is precisely why absolute timeouts were introduced to the API.

micwurm · 2021-02-19T18:38:05Z

I understand the relative timeout case.

But in my test with the absolute timeout the system wakes up 1 tick BEFORE the specified time (endTicks == targetTicks - 1).

I think the problem is that at the point where the +1 happens the absolute timeout is still in its negative representation, thus shortening the timeout.

I think what you want is actually this:

	k_ticks_t ticks = timeout.ticks;

	if (IS_ENABLED(CONFIG_TIMEOUT_64BIT) && Z_TICK_ABS(ticks) >= 0) {
		ticks = Z_TICK_ABS(ticks) - (curr_tick + elapsed());
	} else {
		ticks++;
	}

With this the absolute case wakes up at the precise tick, and the relative case still adds an extra tick.

andyross · 2021-02-19T19:06:35Z

Ah, OK. Indeed, I buy that, the early addition is clearly wrong. Do you want to submit or should I? Ideally we want to add a test case that exhibits the early wakeup too.

micwurm · 2021-02-19T19:47:48Z

Feel free to submit this.

There is actually a second related issue: The computation of the expected timeout in z_tick_sleep() doesn't handle the case where the timeout is absolute. As a result the return value of k_sleep() is garbage (retVal == -1717986927 in the original post).

The computation was using the already-adjusted input value that assumed relative timeouts and not the actual argument the user passed. Absolute timeouts were consistently waking up one tick early. Fixes zephyrproject-rtos#32499 Signed-off-by: Andy Ross <[email protected]>

andyross · 2021-02-19T21:12:19Z

Fix up at #32505

Please open a second bug against sleep, this one is going to get closed automatically when that merges.

micwurm · 2021-02-19T21:34:44Z

See #32506.

micwurm · 2021-02-20T21:22:37Z

I had the pleasure of writing my own timer driver and in the process I got another look at this issue. If I am not wrong, there is still a window of opportunity where absolute timeouts can be off by one tick even after this fix. It is pretty unlikely and it would be hard to actually observe this.

For absolute timeouts there are two calls to elapsed() in z_add_timeout() that should ideally cancel. First of all they are redundant and could be removed for performance reasons. Second, they can also introduce an error if a tick boundary falls exactly between those two call: The second call will return a value higher by one and therefore the sleep period will be one cycle too long.
The same thing can happen if a tick boundary falls between the call to elapsed() in next_timeout() and the execution of z_clock_set_timeout() in the timer driver. The timer driver will interpret the delay as relative to the current tick value, which is one higher than it was when the delay was computed in next_timeout().

Item 1 can be fixed (probably) with this modification:

	k_ticks_t ticks = timeout.ticks;

	if (IS_ENABLED(CONFIG_TIMEOUT_64BIT) && Z_TICK_ABS(ticks) >= 0) {
		ticks = MAX(1, Z_TICK_ABS(ticks) - curr_tick);
	} else {
		ticks += elapsed() + 1;	/* MAX not required because ticks >= 0 before this */
	}

	__ASSERT(!sys_dnode_is_linked(&to->node), "");
	to->fn = fn;

	LOCKED(&timeout_lock) {
		struct _timeout *t;

		to->dticks = ticks;

Item 2: I don't see an easy fix for this. Maybe the timer driver in z_clock_set_timeout() could treat the delay as relative to the tick at which z_clock_elapsed() was last called, instead of the current tick.

nashif assigned andyross Feb 19, 2021

nashif added the bug The issue is a bug, or the PR is fixing a bug label Feb 19, 2021

micwurm mentioned this issue Feb 19, 2021

k_sleep: Invalid return value when using absolute timeout. #32506

Closed

nashif added the area: Kernel label Feb 20, 2021

nashif added the priority: medium Medium impact/importance bug label Feb 20, 2021

nashif closed this as completed in 252764a Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k_sleep duration is off by 1 tick #32499

k_sleep duration is off by 1 tick #32499

micwurm commented Feb 19, 2021

jharris-intel commented Feb 19, 2021 •

edited

Loading

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

micwurm commented Feb 20, 2021

k_sleep duration is off by 1 tick #32499

k_sleep duration is off by 1 tick #32499

Comments

micwurm commented Feb 19, 2021

jharris-intel commented Feb 19, 2021 • edited Loading

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

andyross commented Feb 19, 2021

micwurm commented Feb 19, 2021

micwurm commented Feb 20, 2021

jharris-intel commented Feb 19, 2021 •

edited

Loading