Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k_sleep duration is off by 1 tick #32499

Closed
micwurm opened this issue Feb 19, 2021 · 8 comments
Closed

k_sleep duration is off by 1 tick #32499

micwurm opened this issue Feb 19, 2021 · 8 comments
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@micwurm
Copy link

micwurm commented Feb 19, 2021

I did some tests for a project that required precise timing and I noticed that k_sleep sleeps one tick too long when using a relative timeout, and one tick too short when using an absolute timeout.
Test code:

	targetTicks = k_uptime_ticks() + 2;
	while (k_uptime_ticks() < targetTicks);
	// we are at the beginning of a tick period now
	GPIO_PinOutToggle(PIN_DEBUG);
	targetTicks += 4;
	retVal = k_sleep(K_TICKS(4));	// case 1
	//retVal = k_sleep(K_TIMEOUT_ABS_TICKS(targetTicks));	// case 2
	GPIO_PinOutToggle(PIN_DEBUG);
	endTicks = k_uptime_ticks();

Observed behavior:

k_sleep(K_TICKS(4));
endTicks == targetTicks + 1
retVal == 0

k_sleep(K_TIMEOUT_ABS_TICKS(targetTicks));
endTicks == targetTicks - 1
retVal == -1717986927

Looking at the debug pin with a scope confirms this and I see a 520us pulse in case 1, and a 320us pulse in case 2.

The problem seems to come from the following line in z_add_timeout():

	k_ticks_t ticks = timeout.ticks + 1;

My setup is a efm32gg_stk3701a board (Cortex-M4) with 10kHz ticks, 50MHz clock, no time slicing, 64-bit timeouts enabled.

@nashif nashif added the bug The issue is a bug, or the PR is fixing a bug label Feb 19, 2021
@jharris-intel
Copy link
Contributor

jharris-intel commented Feb 19, 2021

k_ticks_t ticks = timeout.ticks + 1;

This may be because of the following:

If you have, say, a 1kHz clock.

And you wait for 1 tick to pass by immediately before a tick. This will wait ~0ms, plus RTOS overhead.
And you wait for 1 tick to pass by immediately after a tick. This will wait ~1ms, plus RTOS overhead.

Or rather:

Without +1:
Actual wait time is between ticks-1 and ticks, plus RTOS overhead.

With +1:
Actual wait time is between ticks and ticks+1, plus RTOS overhead.

Generally (not always), it's better to wait too long than too short. (And with RTOS overhead there's no guarantee that you won't oversleep anyways.)

@andyross
Copy link
Contributor

This looks correct. The documented behavior of a relative timeout is that it sleeps for at least that long. So in your relative example you wait for tick alignment, sleep for four ticks, and then receive your interrupt at the next tick that arrives after that, which is five ticks after the tick that arrived immediately before the sleep call. We can't wake you up in exactly four ticks without either:

  1. Sometimes breaking the promise that you'll never sleep for less than four ticks.
  2. Delivering a timeout interrupt on something other than a tick boundary

The absolute case gets this right, though. If you call k_uptime_ticks() immediately after that sleep returns, you should find that it produces exactly the uptime you asked for. In fact the ability to reason about questions like this more precisely (without worrying about things like aligned tick interrupts or round-up padding) is precisely why absolute timeouts were introduced to the API.

@micwurm
Copy link
Author

micwurm commented Feb 19, 2021

I understand the relative timeout case.

But in my test with the absolute timeout the system wakes up 1 tick BEFORE the specified time (endTicks == targetTicks - 1).

I think the problem is that at the point where the +1 happens the absolute timeout is still in its negative representation, thus shortening the timeout.

I think what you want is actually this:

	k_ticks_t ticks = timeout.ticks;

	if (IS_ENABLED(CONFIG_TIMEOUT_64BIT) && Z_TICK_ABS(ticks) >= 0) {
		ticks = Z_TICK_ABS(ticks) - (curr_tick + elapsed());
	} else {
		ticks++;
	}

With this the absolute case wakes up at the precise tick, and the relative case still adds an extra tick.

@andyross
Copy link
Contributor

Ah, OK. Indeed, I buy that, the early addition is clearly wrong. Do you want to submit or should I? Ideally we want to add a test case that exhibits the early wakeup too.

@micwurm
Copy link
Author

micwurm commented Feb 19, 2021

Feel free to submit this.

There is actually a second related issue: The computation of the expected timeout in z_tick_sleep() doesn't handle the case where the timeout is absolute. As a result the return value of k_sleep() is garbage (retVal == -1717986927 in the original post).

andyross pushed a commit to andyross/zephyr that referenced this issue Feb 19, 2021
The computation was using the already-adjusted input value that
assumed relative timeouts and not the actual argument the user passed.
Absolute timeouts were consistently waking up one tick early.

Fixes zephyrproject-rtos#32499

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor

Fix up at #32505

Please open a second bug against sleep, this one is going to get closed automatically when that merges.

@micwurm
Copy link
Author

micwurm commented Feb 19, 2021

See #32506.

@micwurm
Copy link
Author

micwurm commented Feb 20, 2021

I had the pleasure of writing my own timer driver and in the process I got another look at this issue. If I am not wrong, there is still a window of opportunity where absolute timeouts can be off by one tick even after this fix. It is pretty unlikely and it would be hard to actually observe this.

  1. For absolute timeouts there are two calls to elapsed() in z_add_timeout() that should ideally cancel. First of all they are redundant and could be removed for performance reasons. Second, they can also introduce an error if a tick boundary falls exactly between those two call: The second call will return a value higher by one and therefore the sleep period will be one cycle too long.

  2. The same thing can happen if a tick boundary falls between the call to elapsed() in next_timeout() and the execution of z_clock_set_timeout() in the timer driver. The timer driver will interpret the delay as relative to the current tick value, which is one higher than it was when the delay was computed in next_timeout().

Item 1 can be fixed (probably) with this modification:

	k_ticks_t ticks = timeout.ticks;

	if (IS_ENABLED(CONFIG_TIMEOUT_64BIT) && Z_TICK_ABS(ticks) >= 0) {
		ticks = MAX(1, Z_TICK_ABS(ticks) - curr_tick);
	} else {
		ticks += elapsed() + 1;	/* MAX not required because ticks >= 0 before this */
	}

	__ASSERT(!sys_dnode_is_linked(&to->node), "");
	to->fn = fn;

	LOCKED(&timeout_lock) {
		struct _timeout *t;

		to->dticks = ticks;

Item 2: I don't see an easy fix for this. Maybe the timer driver in z_clock_set_timeout() could treat the delay as relative to the tick at which z_clock_elapsed() was last called, instead of the current tick.

@nashif nashif added the priority: medium Medium impact/importance bug label Feb 20, 2021
@nashif nashif closed this as completed in 252764a Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants