-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Zephyr to fix Work Queue Problems #33104
Comments
NB: This comment will be updated as things change
The following PRs provide examples of updating work API to avoid the problematic idioms identified above.
Other related PRs:
|
NB: This comment will be updated as things change Changes to original description: 2021-03-08
2021-03-17
2021-03-31
2021-04-16
|
Could you please elaborate on this? I don't see how you can do this in the general case without forcing dynamic allocation of work queue items. In particular: how can you know if the work queue item is safe to reinitialize? Can you provide a simple example of what this looks like for a function that wants to have no more than, say, four work queue items outstanding at a time, erroring out any additional requests over said limit? |
The majority of uses of work queue items in zephyr are dedicated to a single handler function, with operation-specific state provided by embedding the work item in a container that holds that state, and the container statically allocated. There may be some cases where a work item is re-initialized after it's already been submitted, but those have tended to be errors, or unnecessary. If it would be necessary the solution would be to protect submitting the item with a lock, then within the lock confirming the item is not busy (not delayed, queued, or running with or without an in-progress cancellation), which is the state in which it can, if necessary, be reconfigured.
Not conveniently, no, because I don't understand the question or use case. Work items aren't pooled, and a work queue can hold as many work items as are submitted to it: the links are part of each work item. |
It's not the work queue that I'm concerned about here, it's the work queue items, and in particular data associated with the work queue items. The current approach appears to solve half of the problem. Again, coming back to the example: I wish to have no more than 4 outstanding work items for I receive a request How can I determine when it is safe to reinitialize a work queue item from my pool (in particular, additional data associated with a work queue item, a la If I wasn't concerned about race conditions / cancellation / etc, this would be simple (but then again, so would the k_work API!): struct non_atomic_struct {
uint8_t a;
uint64_t b;
};
struct foo_work {
struct k_work work;
struct non_atomic_struct data;
};
void deferred_foo(struct k_work *work {
struct foo_work *data = CONTAINER_OF(work, struct foo_work, work);
do_something_with(data->data);
}
static struct foo_work[4] work_items;
void init(void) {
for (size_t i = 0; i < 4; i++) {
k_work_init(&foo_work[i].work, &deferred_foo);
}
}
bool try_run_foo(non_atomic_struct new_data) {
for (size_t i = 0; i < 4; i++) {
if (!k_work_is_pending(&foo_work[i].work)) {
foo_work[i].data = new_data;
k_work_submit(data);
return true;
}
}
return false;
} I can't do the "submit and see if it works" approach because of I can e.g. wrap this entire thing in a mutex, but at that point the work queue API becomes largely useless here. I'm wrapping the work queue API in another work queue API that still has to spend a bunch of effort making sure that cancellation and such work - in which case it's easier and faster to directly reimplement the work queue API and avoid using the k_work API at all... which is rather unfortunate. Note that e.g. the usb subsystem does something very similar to the above, with |
Good notes, I did not check the documentation but some kind of BKM (Best Known Methods) about the usage would be nice to have in the documentation. |
OK, so let me go back to this original comment and leave the general case of a pool of work items aside. Lots of Zephyr use of delayable work items are doing things like:
But the cancel does not guarantee that whatever data you operate on in stuff() is not being acted on by a work item, because the work item can still be running. So it's wrong. You can't just avoid a lock by asynchronously canceling the work item and assuming there's no longer a risk of multiple access. What I'd rather see instead is:
I can't assess for anybody whether this is easier than doing the above locking and also implementing your own work thread; I think it is, but that's up to you. All I want is for people to stop using There are going to be some cases where you still don't need a lock because of the way items are submitted in a specific system, but they may not be as common as I'd hoped. In fact that's why #33109 (comment) is giving me a headache, and the rework to make it OK is going to be significantly more complex. When I have a handle on that I'll update the best known practices. |
Yeah... You can wrap the entire thing in a lock, sometimes. And now you're wrapping a locking API in your own locking API, which is has unfortunate implications for performance, especially on SMP, and especially as k_work is often used in things that are relatively performance-sensitive (e.g. offload from ISRs). I am also somewhat confused as to why (Spinlock? Or normal lock? If spinlock, ow. If normal lock, what happens when you're in an ISR, which is one of the suggested work queue usecases?) One possibility for some of these cases might be to have a separate optional That is something like: bool try_run_foo(non_atomic_struct new_data) {
for (size_t i = 0; i < 4; i++) {
if (k_work_try_reserve(&foo_work[i].work)) {
foo_work[i].data = new_data;
if (k_work_submit_reserved(data)) { /* fails if work item was cancelled while reserved */
return true;
}
}
}
return false;
} |
"lock" wasn't intended to specify a technology. Please review the API. Cancellation as well as submission/scheduling must be supported from ISRs, so a non-blocking option is required, but there is blocking API as well.
Probably spinlock or state recorded in an |
I didn't say it did - in fact I said otherwise:
Fundamentally, if you're trying to lock something in a nonblocking context, you have to either:
This is agnostic of the lock implementation you are using. If, as you note, previously you were doing k_delayed_work_cancel();
do stuff;
if (stuff_condition) {
k_delayed_work_submit();
} in a nonblocking context (which was one of the suggested usecases of k_work... namely ISR offload), and you replace it with a lock/unlock-based replacement based on your suggestion (for reference: take lock on stuff;
inspect stuff;
if (stuff_condition) {
update stuff;
submit work();
} else {
update stuff to mark nop;
submit work immediately(); // when it's done, everything's clean.
}
release lock on stuff; // so work item can clean up ) you'll have some combination of:
I am aware that the old version was broken in practice. |
Issue #27356 identified a host of race conditions in the Zephyr work queue implementation through release 2.5, and in the way that the API has been used. #28579 (comment) provides a summary of the problems in the original implementation.
The replacement implementation in #29618 that will be in 2.6 eliminated the gaps in being able to precisely determine the state of a work item at the point an operation on it is performed. A summary of work item states is presented in the documentation.
However: fixing the implementation does not eliminate the problems that result from the ways the API is being used.
With the new implementation it is still necessary to take into account:
cancel
can fail to complete immediatelyThis issue serves as a short description of the remaining problems and how to address them, and as a place to hold pointers to PRs where those problems are addressed:
Conversion Status
The following areas seem to have API that needs to be converted; a rote conversion can be found for each in a commit in #33924, which also contains a coccinelle script that was used to do the conversion. Areas must be assigned to maintainers, who should apply the changes from the corresponding commit in #33924, review and adapt it, and submit it as a standalone PR to help verify that there are no dependencies between changes that would break bisectability. Maintainers should also review the best practices that follow, as well as comments and commit messages from the manual conversion, to mitigate any inherent bugs in the existing use of work items.
Anti-Patterns
Checking whether work is pending before deciding whether to submit/schedule
Instantaneous check of work item state is truly instantaneous. Application and subsystem logic that interprets and acts on that state can be incorrect if the animating thread is preempted, or an interrupt occurs, or another core is simulatenously operating on work items.
In the original implementation
k_work_pending
could returnfalse
even though the work item was still in use. Some in-tree code uses the false return as an indication the work item is available, potentially resetting the structure content while it's being used by another thread (see 4817a41 from #32890).While this specific failure mode is no longer possible ("not pending" does mean the work item is, at that instant, not in use), the other direction remains: that a work item is still in use does not mean that it's unnecessary to submit it again so that newly available work will be processed. The work item may have all but finished, and without a second submit there will be a potentially unbounded delay in processing new work.
Presence of any of these in a conditional operation before invoking a work queue operation should be reviewed:
k_work_pending()
,k_work_is_pending()
,k_delayable_work_pending()
except:k_delayable_work_remaining()
to zero;k_{,delayable_}work_busy_get()
Please note that merely changing the legacy API in these uses to the new API does not fix the problem. As an example, although 7b8bce8 was merged in #32890, the openthread code may still misbehave, especially in SMP systems or when preemption is involved. See the description of that commit for details.
Unchecked Return Values
The vast majority of uses of
k_delayed_work_cancel()
in Zephyr currently assume that this operation will never fail, and so that after requesting a cancel the work item could be re-scheduled or reconfigured, or that there was a guarantee that the work item would not be invoked.This was never true, and it is even more true as a result of adding support to synchronize with work items, because a work item will still be pending for a small window after its handler function has returned.
Best Practices
In most cases avoiding the race conditions requires code and data changes. Below are some practices to consider when refactoring.
Don't over optimize
The
k_work_*()
API is designed to be safe when invoked from multiple threads and interrupts. Attempts to externally inspect a work item's state and make decisions based on the result are likely to create new problems.So when new work comes in, just submit it. Don't attempt to "optimize" by checking whether the work item is already submitted by inspecting snapshot state with
k_work_is_pending()
ork_work_busy_get()
, or for a non-zero delay fromk_work_delayable_remaining_get()
. Those checks are fragile: a "busy" indication can be obsolete by the time the test is returned, and a "not-busy" indication can also be wrong if work is submitted from multiple contexts.Using the work handler as the standard cleanup path also makes the code more clear: rather than having to deal with cancellation and cleanup at points where items are submitted, you may be able to have everything done in one place: the work handler itself.
A rare case where you could safely use
k_work_is_pending()
is as a check to avoid invokingk_work_flush()
ork_work_cancel_sync()
, if you are certain that nothing else might submit the work while you're checking. In some cases it can be more robust to just issue the command and have the state checked in the handler, especially if you need locked access to state anyway.Document Uses of Racy Idioms
There may be cases where temporary or even permanent work still invokes operations like
k_cancel_delayable()
without being able to check or respond to indications that the commit fails. It is good practice to document at the point of call what happens in that case, e.g.:zephyr/subsys/bluetooth/mesh/cfg_srv.c
Lines 204 to 207 in 88ae27f
Protecting Associated Data
Sometimes the data a work item must process is naturally thread-safe, for example when it's put into a
k_queue
by some thread and processed in the work thread. Where it is not, external synchronization is required to ensure neither the work thread nor other threads are inspecting/manipulating state simultaneously with another thread or interrupt.Spin locks (
k_spinlock_t
) or thread-aware locks (k_sem
,k_mutex
, ...) may be used to ensure data races don't occur. If work is added in an ISR a spin-lock may be required.If the selected lock mechanism sleeps then allowing the work thread to sleep will starve other work queue threads, which may need to make progress in order to get the lock released. Work handlers should try to take the lock with its no-wait path. For example:
If you need to use a lock this way be sure that your code is robust when resubmission fails because the work item had been cancelled. Usually to make this robust some other thread must be waiting for the work item to complete (e.g. via
k_work_cancel_sync()
) or must periodically inspect the state to confirm that it's no longer pending (ascheck_query_active
does in https:/zephyrproject-rtos/zephyr/blob/master/subsys/net/lib/dns/resolve.c).Note that work items (in isolation) are self-locking, so you don't need to hold an external lock just to re-submit or (re)schedule them. Even if you use external state protected by such a lock to prevent further resubmission, it's safe to do the resubmit as long as you're sure that eventually the item will take its lock and check that state to determine it should do nothing.
Use the Right Scheduling Operation
For delayed work there are two common use cases, depending on whether a deadline should be extended if a new event occurs. An example is collecting data that comes in asynchronously, e.g. characters from a UART associated with a keyboard. The desired behavior might be:
k_work_schedule()
.k_work_reschedule()
.In neither of these cases should the code be looking at
k_work_delayable_remaining_get()
to decide what to do, in part because that's racy and you can end up failing to submit when there's work pending. More information is in the documentation.Checking that a Work Item is Completed
If you need to be sure that potentially submitted work is not executed then set a flag in state that's exposed to the work handler, and have the handler detect it and do the cleanup. Then if necessary synchronize with the work item, first resubmitting it if necessary so that the handler can perform state cleanup.
Finding the Containing Instance in a Work Handler
Be aware that when using an embedded
struct k_work_delayable
the handler will be invoked using a pointer to the contained work structure in that delayable structure. The internals of that structure are not public API, and it is not permitted to useCONTAINER_OF
relative to thestruct work* work
passed to the handler. The correct way of getting a pointer to a structure that contains a delayable work structure is exemplified byzephyr/tests/kernel/workq/work/src/main.c
Line 967 in a865b1b
The text was updated successfully, but these errors were encountered: