net: process_rx_packet() work handler violates requirements of Workqueue Threads implementation #34690

mnkp · 2021-04-29T08:36:49Z

Describe the bug

The Workqueue Threads implementation imposes the following requirement

A pending work item must not be altered until the item has been processed by the workqueue thread. This means a work item must not be re-initialized while it is busy. Furthermore, any additional information the work item’s handler function needs to perform its work must not be altered until the handler function has finished executing.

This requirement is violated by void process_rx_packet(struct k_work *work) work handler function implemented in subsys/net/ip/net_core.c.

In the RX path when data are received, e.g. from an Ethernet driver, the packet is submitted to a workqueue in an interrupt context and then processed by process_rx_packet(struct k_work *work) work handler function in a thread context. In case of an error path (when the packet is not passed onto higher layers but needs to be discarded) the process_rx_packet function will free the memory allocated for the packet and along the memory where the k_work item is stored. I.e. ultimately the k_work item is freed before process_rx_packet function returns. If the same memory block is obtained and then resubmitted by the Ethernet driver we could overwrite content of k_work item.

Impact
This is a race condition, may lead to data lose / internal data corruption of the networking stack.

Environment (please complete the following information):

zephyr-v2.5.0

The text was updated successfully, but these errors were encountered:

jukkar · 2021-04-29T11:48:46Z

Thanks @mnkp.

This issue was already discussed in slack with following summary:

Because of this workqueue restriction, it is not easy to free the net_pkt that contains the used k_work, as we cannot unref it from workhandler (as that would overwrite the k_work which is still in use)
One possible solution could be to use k_fifo instead of k_work in RX path, so the net_data_recv() would place the net_pkt (that contains the received network data) into a k_fifo instead of calling workqueue handler
This way we could remove the k_work (at least in RX path) from net_pkt saving some bytes
The changes would be quite minimal and mostly contained in net_core.c
The performance impact of this is unknown atm
TX path has also the same issue so would need tweaking too.

I will investigate if we could implement this still for 2.6

FYI cc: @rlubos @mniestroj

jukkar · 2021-04-29T13:20:23Z

The changes were actually quite easy to do, and we get 10% better performance for both RX and TX. I think is good candidate for 2.6.

The k_work handler cannot manipulate the used k_work. This means that it is not easy to cleanup the net_pkt because it contains k_work in it. Because of this, use k_fifo instead between RX thread and network driver, and between application and TX thread. A echo-server/client run with IPv4 and UDP gave following results: Using k_work ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 21922 5543071 103 us [0->41->26->34=101 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (0) 0 0 - [1] BE (0) 21925 6039151 97 us [0->21->16->37->20=94 us] Using k_fifo ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 15079 3811118 94 us [0->36->23->32=91 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (1) 0 0 - [1] BE (0) 15073 4150947 79 us [0->17->12->32->14=75 us] So using k_fifo gives about 10% better performance with same workload. Fixes zephyrproject-rtos#34690 Signed-off-by: Jukka Rissanen <[email protected]>

The k_work handler cannot manipulate the used k_work. This means that it is not easy to cleanup the net_pkt because it contains k_work in it. Because of this, use k_fifo instead between RX thread and network driver, and between application and TX thread. A echo-server/client run with IPv4 and UDP gave following results: Using k_work ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 21922 5543071 103 us [0->41->26->34=101 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (0) 0 0 - [1] BE (0) 21925 6039151 97 us [0->21->16->37->20=94 us] Using k_fifo ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 15079 3811118 94 us [0->36->23->32=91 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (1) 0 0 - [1] BE (0) 15073 4150947 79 us [0->17->12->32->14=75 us] So using k_fifo gives about 10% better performance with same workload. Fixes #34690 Signed-off-by: Jukka Rissanen <[email protected]>

mnkp added bug The issue is a bug, or the PR is fixing a bug area: Networking labels Apr 29, 2021

mnkp assigned jukkar Apr 29, 2021

jukkar added the priority: medium Medium impact/importance bug label Apr 29, 2021

jukkar mentioned this issue Apr 29, 2021

net: Use k_fifo instead of k_work in RX and TX processing #34703

Merged

galak closed this as completed in #34703 May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: process_rx_packet() work handler violates requirements of Workqueue Threads implementation #34690

net: process_rx_packet() work handler violates requirements of Workqueue Threads implementation #34690

mnkp commented Apr 29, 2021

jukkar commented Apr 29, 2021

jukkar commented Apr 29, 2021

net: process_rx_packet() work handler violates requirements of Workqueue Threads implementation #34690

net: process_rx_packet() work handler violates requirements of Workqueue Threads implementation #34690

Comments

mnkp commented Apr 29, 2021

jukkar commented Apr 29, 2021

jukkar commented Apr 29, 2021