Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

power: device_pm_get_sync not thread-safe #31856

Closed
JordanYates opened this issue Feb 2, 2021 · 4 comments
Closed

power: device_pm_get_sync not thread-safe #31856

JordanYates opened this issue Feb 2, 2021 · 4 comments
Assignees
Labels
area: Power Management bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Milestone

Comments

@JordanYates
Copy link
Collaborator

Describe the bug

When N threads are both requesting the same device to turn on via device_pm_get_sync, it is possible that only one of those threads will ever return from the call. The problem is in the synchronous path of the underlying device_pm_request function.

After early exiting on two conditions (requesting active state with a negative usage count, and requesting sleep state with a usage count > 1), the work function is submitted to the system workqueue. In the synchronous case, the function then blocks on a signal that is owned by the device transitioning between modes. This signal is then raised when the work item completes.

Therefore, when two threads are both turning on the same device (I2C, SPI, etc), both threads end up blocking on the same instance of a struct k_poll_signal. As per the documentation (https://docs.zephyrproject.org/latest/reference/kernel/other/polling.html#using-k-poll-signal-raise), signals can only be waited for by a single thread, so this is undefined behaviour.

Currently this results in one thread exiting while the other is permanently stuck on the k_poll call.

Expected behavior
When two threads call device_pm_get_sync on a device at the same time, both threads should return when the device has finished transitioning.

Impact
Lower priority threads become permanently blocked, necessitating a reboot.

Environment (please complete the following information):

  • Zephyr v2.4

Additional context
From the documentation of struct device_pm, it appears as thought the lock semaphore should be used to mediate access to the signal both threads are waiting on.

@JordanYates JordanYates added the bug The issue is a bug, or the PR is fixing a bug label Feb 2, 2021
@nashif nashif added the priority: low Low impact/importance bug label Feb 2, 2021
@pabigot pabigot assigned pabigot and unassigned ceolin Feb 2, 2021
@pabigot
Copy link
Collaborator

pabigot commented Feb 2, 2021

The current implementation also uses atomics incorrectly, and fails to lock to protect access to many values, including the enable field.

The whole logic in device_pm must be revisited, potentially replaced by use of an on-off manager, which precisely handles transitions between an on and off state regardless of number of requesters and context.

@pabigot pabigot removed their assignment Mar 7, 2021
@nashif nashif added this to the v2.6.0 milestone Mar 11, 2021
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label May 11, 2021
@ceolin
Copy link
Member

ceolin commented May 11, 2021

#34781

@github-actions github-actions bot removed the Stale label May 12, 2021
@ceolin
Copy link
Member

ceolin commented May 26, 2021

#35646

@ceolin ceolin closed this as completed May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Power Management bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants