-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bluetooth: mesh: update heartbeat use of k_work API #33148
Conversation
b2fb5d9
to
5b3ebb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: in the absence of a meaningful sample I can't test this; please verify that it works.
@@ -149,17 +152,18 @@ static void hb_publish(struct k_work *work) | |||
|
|||
BT_DBG("hb_pub.count: %u", pub.count); | |||
|
|||
/* Fast exit if disabled or expired */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a subtle behavior change here: for publish where the count had expired publish could detect a lost subnet and change state. Assuming that is not required behavior I've moved the fast-exit for expired/disabled before the subnet check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works. Looks like the publish disabled decision point isn't handled consistently, but I'll address this in a separate PR after this goes in.
@@ -149,17 +152,18 @@ static void hb_publish(struct k_work *work) | |||
|
|||
BT_DBG("hb_pub.count: %u", pub.count); | |||
|
|||
/* Fast exit if disabled or expired */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works. Looks like the publish disabled decision point isn't handled consistently, but I'll address this in a separate PR after this goes in.
subsys/bluetooth/mesh/heartbeat.c
Outdated
@@ -216,7 +220,7 @@ static void pub_disable(void) | |||
pub.ttl = 0U; | |||
pub.period = 0U; | |||
|
|||
k_delayed_work_cancel(&pub_timer); | |||
k_work_reschedule(&pub_timer, K_NO_WAIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you refrained from using k_work_cancel_delayable
here just to stay consistent with the change on line 259, or is there some reason why cancel shouldn't be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found that most uses of k_delayed_work_cancel()
are wrong because they don't check for failure, and if the work item is still running they aren't prepared for what might happen. This is largely due to incomplete handling of thread safety, as may be the case in this implementation (different threads can be accessing the pub
fields simultaneously). See #33104, which is a work in progress and doesn't adequately address the complexities of this yet.
So where "cancel" can be replaced with "set a flag and allow the work item to do its job and then stop" I prefer that.
In this case it probably doesn't add any real value to try to avoid the cancel. I've reverted this for now, so it's more clear that there's something not quite right here that should be revisited when we have a better understanding of the nuances.
if (!k_delayed_work_cancel(&sub_timer)) { | ||
notify_sub_end(); | ||
} | ||
sub.period = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to follow this line up with a PTS run, I think. I couldn't find any places in the test spec where the old behavior was required, but it could be a quirk in the PTS implementation.
Won't let this block this PR though.
cc @michal-narajowski
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran HBP and HBS tests with this PR and they are passing so this looks good to me.
Replace the legacy delayed work API with the new delayable work API. Use a zero period as a flag value to ensure that the work handler is a no-op of the publish operation is disabled. Signed-off-by: Peter Bigot <[email protected]>
Replace the legacy delayed work API with the new delayable work API. Avoid cancelling work and manually notifying when the subscription is disabled; instead allow the work item to do this. Signed-off-by: Peter Bigot <[email protected]>
5b3ebb5
to
8a0f75d
Compare
Replace the legacy delayed work API with the new delayable work API. Avoid race conditions in canceling work items by using a zero period as a flag value to ensure that the work publish handler is a no-op of the publish operation is disabled, and allow subscription expiration to be done by the handler.