Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport v2.7-branch] Bluetooth: fix HCI command / data deadlock #58725

Merged
merged 12 commits into from
Jun 6, 2023

Conversation

theob-pro
Copy link
Contributor

@theob-pro theob-pro commented Jun 1, 2023

Fixes #58412.

The following commits from #50476 have been added:

#52760 is also backported with this PR.

Fixes: #57947

@github-actions github-actions bot added area: Bluetooth area: Bluetooth Host area: Tests Issues related to a particular existing or missing test labels Jun 1, 2023
@theob-pro theob-pro force-pushed the backport-52536-to-v2.7-branch branch from ce7423c to 75187ae Compare June 1, 2023 09:03
Copy link
Collaborator

@jori-nordic jori-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment

@theob-pro theob-pro force-pushed the backport-52536-to-v2.7-branch branch 2 times, most recently from 2f394aa to 5fd0f56 Compare June 2, 2023 10:41
@github-actions github-actions bot added area: API Changes to public APIs area: Bluetooth Mesh labels Jun 2, 2023
@theob-pro theob-pro force-pushed the backport-52536-to-v2.7-branch branch 3 times, most recently from acbe5e0 to 87ada17 Compare June 5, 2023 09:01
jori-nordic and others added 12 commits June 5, 2023 13:11
This test reproduces more-or-less #34600.

It has a central that connects to multiple peripherals, opens one l2cap
CoC channel per connection, and transmits a few SDUs largely exceeding
the MPS of the channel.

In this commit, the test doesn't pass, but when it passes (after the
subsequent commits), error and warning messages are expected from the
stack, as this is not the happy path.

We can later debate on whether these particular error messages should
be downgraded to debug.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 7a6872d)
See code comment

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 1c8fe67)
There was an edge-case where we were sending back too much credits, add
a check so we can't do that anymore.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 3c1ca93)
See the code comments.

SDUs might enter a state where they will be blocked forever, as a
workaround, we nudge them when another SDU has been sent.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 8e207fe)
This callback allows use-cases where the SDU is much larger than the
l2cap MPS. The stack will then try to allocate using this callback if
specified, and fall-back on using the buffer's pool (previous
behavior).

This way one can define two buffer pools, one with a very large buffer
size, and one with a buffer size >= MPS, and the stack will allocate
from that instead.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 77e1a9d)
Make the ACL fragmentation asynchronous, freeing the TX thread for
sending commands when the ACL buffers are full.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit c3e5fab)
When there are no buffers, it doesn't make sense to repeatedly try to
send the host TX queue.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit ef19c64)
Only copy the data from the 'parent' buf into the segment buf if we
successfully acquired the HCI semaphore (ie there are available
controller buffers).

This way we avoid pulling and pushing back the data in the case where
there aren't any controller buffers available.

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit ca51439)
Do these things:
- use the proper macros for reserving the SDU header
- make every L2CAP PDU fragment into 3 ACL packets
- set tx data and verify rx matches the pattern
- measure segment pool usage

Signed-off-by: Jonathan Rico <[email protected]>
(cherry picked from commit 8bc0946)
This patch reworks how fragments are handled in the net_buf
infrastructure.

In particular, it removes the union around the node and frags members
in the main net_buf structure. This is done so that both can be used at
the same time, at a cost of 4 bytes per net_buf instance.
This implies that the layout of net_buf instances changes whenever
being inserted into a queue (fifo or lifo) or a linked list (slist).

Until now, this is what happened when enqueueing a net_buf with frags
in a queue or linked list:

1.1 Before enqueueing:

 +--------+      +--------+      +--------+
 |#1  node|\     |#2  node|\     |#3  node|\
 |        | \    |        | \    |        | \
 | frags  |------| frags  |------| frags  |------NULL
 +--------+      +--------+      +--------+

net_buf #1 has 2 fragments, net_bufs #2 and #3. Both the node and frags
pointers (they are the same, since they are unioned) point to the next
fragment.

1.2 After enqueueing:

 +--------+     +--------+     +--------+     +--------+     +--------+
 |q/slist |-----|#1  node|-----|#2  node|-----|#3  node|-----|q/slist |
 |node    |     | *flag  | /   | *flag  | /   |        | /   |node    |
 |        |     | frags  |/    | frags  |/    | frags  |/    |        |
 +--------+     +--------+     +--------+     +--------+     +--------+

When enqueing a net_buf (in this case #1) that contains fragments, the
current net_buf implementation actually enqueues all the fragments (in
this case #2 and #3) as actual queue/slist items, since node and frags
are one and the same in memory. This makes the enqueuing operation
expensive and it makes it impossible to atomically dequeue. The `*flag`
notation here means that the `flags` member has been set to
`NET_BUF_FRAGS` in order to be able to reconstruct the frags pointers
when dequeuing.

After this patch, the layout changes considerably:

2.1 Before enqueueing:

 +--------+       +--------+       +--------+
 |#1  node|--NULL |#2  node|--NULL |#3  node|--NULL
 |        |       |        |       |        |
 | frags  |-------| frags  |-------| frags  |------NULL
 +--------+       +--------+       +--------+

This is very similar to 1.1, except that now node and frags are
different pointers, so node is just set to NULL.

2.2 After enqueueing:

 +--------+      +--------+      +--------+
 |q/slist |------|#1  node|------|q/slist |
 |node    |      |        |      |node    |
 |        |      | frags  |      |        |
 +--------+      +--------+      +--------+
                     |           +--------+       +--------+
                     |           |#2  node|--NULL |#3  node|--NULL
                     |           |        |       |        |
                     +-----------| frags  |-------| frags  |------NULL
                                 +--------+       +--------+

When enqueuing net_buf #1, now we only enqueue that very item, instead
of enqueing the frags as well, since now node and frags are separate
pointers. This simplifies the operation and makes it atomic.

Resolves #52718.

Signed-off-by: Carles Cufi <[email protected]>
(cherry picked from commit 3d306c1)
This commit removes illegal use of NET_BUF_FRAG in friend.c, which is an
internal flag.

Now `struct bt_mesh_friend_seg` keeps pointer to a first received
segment of a segmented message. The rest segments are added as fragments
using net_buf API. Friend Queue keeps only head of the fragments.
When one segment (currently head of fragments) is removed from Friend
Queue, the next segment is added to the queue. Head has always 2
references: one when allocated, another one when added as fragments
head.

Signed-off-by: Pavel Vasilyev <[email protected]>
(cherry picked from commit 5d05911)
Increase `NET_BUF_USER_DATA_SIZE` value to 8 if `BT_CONN` is enabled.
This is necessary because one of the backported commits adds one struct
member into `struct tx_meta`, and that will be stored in the buffer
user data.

On main, this Kconfig option has been deprecated, hence why this change
was not present in the original patch set.

Signed-off-by: Théo Battrel <[email protected]>
@cfriedt
Copy link
Member

cfriedt commented Jun 6, 2023

@jhedberg - looks to be ready to me, but this will need your approval. Please take a look.

@cfriedt
Copy link
Member

cfriedt commented Jun 6, 2023

Thanks to everyone for backporting fixes to LTS 👍

@cfriedt cfriedt merged commit f72d8ff into v2.7-branch Jun 6, 2023
@sheindel-llt
Copy link

sheindel-llt commented Jun 7, 2023

Thanks for doing this backport!

@nashif nashif deleted the backport-52536-to-v2.7-branch branch August 15, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Bluetooth Host area: Bluetooth Mesh area: Bluetooth area: Networking area: Tests Issues related to a particular existing or missing test
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Backport v2.7-branch] Failed to backport #52536 Backport Bluetooth fixes to 2.7 branch
7 participants