Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net_buf issue leads to unwanted elem free #38829

Closed
XavierChapron opened this issue Sep 24, 2021 · 1 comment
Closed

net_buf issue leads to unwanted elem free #38829

XavierChapron opened this issue Sep 24, 2021 · 1 comment
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug

Comments

@XavierChapron
Copy link
Contributor

XavierChapron commented Sep 24, 2021

Describe the bug
Current net_buf structure shared the first word in a union between sys_snode_t node; and struct net_buf *frags;.
This has been introduced a very long time ago : 553af7c
However, net_buf allocation are handled by a LIFO, and the current documentation of k_lifo_put(), clearly states that the first word of the element must be reserved for the kernel, which is not the case here.

* This routine adds a data item to @a lifo. A LIFO queue data item must be

It appears, that this shared first word works great in many cases as the kernel mostly use it only when the element is in the queue and therefore the element is not allocated at this time.
However, in a edge case (probably when the queue is empty and a new element is added), the last element is modified, even if it is currently in used.
At that moment, the value of node is written, hence the frags value is not NULL anymore, and bad things happens in my use case.

To Reproduce
I've currently see this using mcumgr and smp_bt, however I expect it might be reproducible more easily when the edge case as been localized more accurately.

Impact
On my case, this leads to a mess in the net_buf pool, that leads to bt_conn ref not being properly released, witch means bt_le_adv_resume() is never called in bt_conn_unref().

Environment:

  • Zephyr SDK,
  • nrf52840,
  • zephyr 2.4 but as neither net_buf nor queue implementations s changed since, I really think main is also affected.
@XavierChapron XavierChapron added the bug The issue is a bug, or the PR is fixing a bug label Sep 24, 2021
XavierChapron pushed a commit to XavierChapron/zephyr that referenced this issue Sep 24, 2021
net_buf elements are used in net_pool which are based on lifo
implementation. Yet, net_buf structure doesn't respect lifo requierement
that the first word must be reserved for the lifo kernel implementation.

In most cases, this is fine as this word is mostly accessed when the
element is not allocated, however, this is not always true.
In such situation, node element is written, ehnce frags element value is
not NULL anymore and anything might happen...

This fixes zephyrproject-rtos#38829 issue.

Signed-off-by: Xavier Chapron <[email protected]>
@XavierChapron
Copy link
Contributor Author

This is a eventually a duplicate of #32579.
See #38830 (comment) for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants