-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupt CBOR payloads in MCUMGR when sending multiple commands together #32579
Comments
Suspect that the issue is related to incorrect fifo use for incoming netbuf's: In Suggest correcting these calls and auditing other modules within Zephyr to ensure there are no other instances of For context, see: https://docs.zephyrproject.org/latest/reference/networking/net_buf.html |
@jrhees-cae Could you provide more context related to this issue? How to reproduce it? Which Zephyr version did you use, what type of transport protocol was used (BT, UART, SHELL, UDP), etc.? On which side do corrupted CBOR payloads appear (Zephyr or host)? Looking quickly at |
According to the docs (and my reading of the buf source code), netbuf references should always use net_buf_put/get in case the netbuf has fragments. This only would be seen when I was doing multiple SMP commands stacked (e.g. I was using multiple netbufs from the pool before they were freed). |
My understanding of the code is that no net_buf fragments are used. Instead, just single net_buf is allocated and enqueued to FIFO at a time. Then each received net_buf is processes separately. So that is why I think |
Hmm, interesting. Is there any sort of debugging info I could provide to help understand this issue better? Changing those function calls definitely cleared up the problem (on my setup anyway), but I completely understand that it might be a red herring. |
As a start, you could provide data based on which you have filed this bug. So dump of bytes before putting net_buf on FIFO and after getting from FIFO. I would like to see those "corrupted" packets and their data when they were "okay" before putting them into FIFO. Still, I would like to see what Zephyr version you are using. |
per |
Do you use the mcumgr cli (https:/apache/mynewt-mcumgr-cli) for the communication? I have not been able to find info on how you contact the mcumgr within Zephyr. |
To exercise the MCUMGR, I have been using the "nRF Connect Device Manager" application for Android. This application works OK as-is, and is based on (a Nordic-themed modified fork of) the mcumgr-android library here: https:/JuulLabs-OSS/mcumgr-android However, the mcumgr-android library was updated with a new feature to speed up firmware uploads by sending multiple upload packets in a row before waiting for the response from the previous write. Attempting to use these new features causes the behavior I am describing. |
I'm running into this issue as well and separately reached the same conclusion that the With the unmodified code (
Replacing
Which I believe matches the expected behavior. The failing case results in the net bufs being re-processed after they've been freed, which naturally leads to all sorts of mayhem. This fails any time multiple requests are received between each run of the processing loop. This means it's pretty easy to repro by just blocking the work handler from running long enough to get two or more requests in. In my case I'm running into using the mcumgr CLI, and the error occurs while waiting for a flash page erase. |
@khutchens Thanks for digging into this more deeply. I haven't had cycles to do it myself. |
Thanks for testing! I have looked once again into |
It may be worth an audit of other modules in Zephyr to see if this change (e.g. using k_fifo... api's with netbufs) needs to be applied elsewhere. |
I've taken a pretty quick look at other modules and didn't see issues there. But it would be good if others can confirm. |
I scrubbed through the codebase and only found one other case where a There are also many instances of using |
@khutchens I got to the same conclusions as you. |
Fixing problem where k_fifo functions have been used to get/put data from/info net buf, where documentation has been strictly forbidding so. Found, reported and solution suggested by jrhees-cae. Fixes: zephyrproject-rtos#32579 Signed-off-by: Dominik Ermel <[email protected]>
Fixing problem where k_fifo functions have been used to get/put data from/info net buf, where documentation has been strictly forbidding so. Found, reported and solution suggested by jrhees-cae. Fixes: #32579 Signed-off-by: Dominik Ermel <[email protected]>
When using MCUMGR SMP commands in quick succession, corrupt CBOR payloads are seen by the parser.
The text was updated successfully, but these errors were encountered: