net: Use k_fifo instead of k_work in RX and TX processing #34703

jukkar · 2021-04-29T13:18:29Z

The k_work handler cannot manipulate the used k_work. This means
that it is not easy to cleanup the net_pkt because it contains
k_work in it. Because of this, use k_fifo instead between
RX thread and network driver, and between application and TX
thread.

A echo-server/client run with IPv4 and UDP gave following
results:

Using k_work
------------
TX traffic class statistics:
TC  Priority	Sent pkts	bytes	time
[0] BK (1)	21922		5543071	103 us	[0->41->26->34=101 us]
[1] BE (0)	0		0	-
RX traffic class statistics:
TC  Priority	Recv pkts	bytes	time
[0] BK (0)	0		0	-
[1] BE (0)	21925		6039151	97 us	[0->21->16->37->20=94 us]

Using k_fifo
------------
TX traffic class statistics:
TC  Priority	Sent pkts	bytes	time
[0] BK (1)	15079		3811118	94 us	[0->36->23->32=91 us]
[1] BE (0)	0		0	-
RX traffic class statistics:
TC  Priority	Recv pkts	bytes	time
[0] BK (1)	0		0	-
[1] BE (0)	15073		4150947	79 us	[0->17->12->32->14=75 us]

So using k_fifo gives about 10% better performance with same
workload.

Fixes #34690

Signed-off-by: Jukka Rissanen [email protected]

pabigot

Using dedicated threads instead of a work queue thread that's only ever going to be asked to run one handler function makes a lot more sense, especially when you don't need delays or cancellation. If you need flush you can probably handle it better with the dedicated solution anyway.

subsys/net/ip/net_tc.c

jukkar · 2021-04-30T08:42:21Z

@alexanderwachter I converted 6locan to not use k_work in net_pkt. Unfortunately I have no way to test the changes, so asking would you be able to review and possibly check if the changes work as expected.

jukkar · 2021-04-30T08:44:55Z

@mniestroj @tsvehagen You seem to have modified the offloading esp wifi driver last time. The driver uses net_pkt for its internal use but we want to get rid of the k_work in net_pkt. Unfortunately the driver looked somewhat convoluted so I was not able to modify it. Any suggestion how to solve this issue?

tsvehagen · 2021-05-03T08:42:24Z

I think the use of k_work in net_pkt was a quite recent change. Not sure if it's easy to revert or if we should solve it some other way. What do you think @mniestroj?

jukkar · 2021-05-05T08:15:34Z

Does anyone have time to review this one, it would be nice to get this merged by Friday?

mnkp

The ways commits are split makes it easy to review and understand the changes. However, if someone runs git bisect while investigating their wifi esp code and then happen to checkout

net: Use k_fifo instead of k_work in RX and TX processing

they will get a compile time error. Maybe it would be better to squash the two commits

drivers: wifi: esp: Temporary fix to allow compilation
net: Use k_fifo instead of k_work in RX and TX processing

together?

subsys/net/l2/canbus/6locan.c

jukkar · 2021-05-05T10:48:31Z

Maybe it would be better to squash the two commits
* drivers: wifi: esp: Temporary fix to allow compilation

* net: Use k_fifo instead of k_work in RX and TX processing
together?

Perhaps, or we could move the esp change before the main commit net: Use k_fifo instead of k_work in RX and TX processing. This way there should be no issue with bisect.

I wonder if we should change the name of the function, since _work_handler is typically associated with Workqueue Threads.

I did not want to do too big changes with naming. The thread is basically doing "work" here, just that it is not happening via k_work. Function naming leads always to interesting discussion, so if there are suggestions what these changed functions should be called, I am all ears.

subsys/net/l2/ppp/ppp_l2.c

subsys/net/l2/ppp/Kconfig

subsys/net/l2/ppp/ppp_l2.c

mniestroj · 2021-05-05T11:34:30Z

include/net/can.h

+ /** TX queue */
+ struct k_fifo tx_queue;
+ /** RX error queue */
+ struct k_fifo rx_err_queue;
+ /** Queue handler thread */
+ struct k_thread queue_handler;
+ /** Queue handler thread stack */
+ K_KERNEL_STACK_MEMBER(queue_stack, 512);


It looks like "600 bytes * number of CAN contexts" more RAM is used after this change. I am fine with this change as is, but we might want to use in the future single thread for CAN, same as for PPP.

It looks like "600 bytes * number of CAN contexts" more RAM is used after this change. I am fine with this change as is, but we might want to use in the future single thread for CAN, same as for PPP.

Yeah, the 6locan driver is almost supporting multiple contexts / interfaces. It looks like the purpose was to support multiple context but it there are some bits and pieces missing to fully support this. I did not want to start to tweak this driver for now, but it can be done later if really needed. I will leave this to @alexanderwachter to decide.

mniestroj · 2021-05-05T11:49:00Z

include/net/net_pkt.h

+#if defined(CONFIG_WIFI_ESP)
+ /* TODO: The work is to be removed after the esp offloading
+ * wifi driver is modified to use its own private work struct.
+ */
+ struct k_work esp_work;
+#endif


I need to find some time to fix/change ESP driver, it won't happen this week. I wonder if there might be other downstream drivers that use pkt->work. net_pkt_work() API was kind of public, so I wonder if we should follow deprecation policy here for other drivers. So for example instead of introducing esp_work and removing work, add something like:

struct net_pkt { union { #ifdef CONFIG_NET_PKT_WORK /** Internal variable that is used when packet is sent * or received. */ struct k_work work; #endif ...

with hidden Kconfig option:

config NET_PKT_WORK bool

which could be selected by ESP driver and any other downstream drivers or L2 layers.

The k_work is strictly for internal use so we are free to remove it if needed. Any downstream user using it will need to change its code.

Then what makes struct k_work work; different from struct net_pkt_cursor cursor;? They are both described as Internal variable... or Internal buffer..., which means that neither should be used by downstream code, right? But if someone wants to implement driver like ESP-AT (but for different chip), then net_pkt_cursor_init() still needs to be used for properly initializing packet before pushing it to other networking layers. This however makes pkt->cursor no longer internal. Well, it all depends on what "internal" really means.

Maybe we should better describe which structure members are reserved by native networking stack? And prevent using them by networking drivers, like ESP-AT does right now with pkt->work?

Maybe we should better describe which structure members are reserved by native networking stack? And prevent using them by networking drivers, like ESP-AT does right now with pkt->work?

Yes, that would be a good thing to do. It is a bit difficult to prevent the code from using the "internal" variables as we need to have net_pkt in the "open". The access functions for the internal variables are not documented so any downstream code should not use these vars anyway. For your cursor example, the cursor manipulation API is there and documented so I would not consider it as internal. The apps should never access the variables directly anyway.

mniestroj · 2021-05-05T12:00:05Z

@mniestroj @tsvehagen You seem to have modified the offloading esp wifi driver last time. The driver uses net_pkt for its internal use but we want to get rid of the k_work in net_pkt. Unfortunately the driver looked somewhat convoluted so I was not able to modify it. Any suggestion how to solve this issue?

@jukkar I will look into ESP driver, but not this week. Unfortunately reverting to k_fifo (as was used before) is not an option, due to bugs that using work object allowed us to solve. It will be major work for ESP driver to get rid of pkt->work...

jukkar · 2021-05-05T12:04:21Z

I will look into ESP driver, but not this week. Unfortunately reverting to k_fifo (as was used before) is not an option, due to bugs that using work object allowed us to solve. It will be major work for ESP driver to get rid of pkt->work...

Yes, I understand. I looked at the driver but it seemed quite a lot of work so did not try to do it myself. I think we should merge this PR like this and ESP changes can come when they are ready.

subsys/net/ip/net_tc.c

pfalcon · 2021-05-05T18:44:45Z

I agree that this patchset leads to mixed feelings. On one hand, it definitely cleans up structure and processing - it's quite natural that, if we need to decouple IRQ-based and normal processing, then to put packets in a queue. On the other hand, that leads to proliferation of different queues, and most importantly, handlers threads which are just duplication of each other and differ only in which actual handler function they call.

It immediately leads to an idea to add a handler callback to net_pkt, and a common handler thread which would just call the handler. But that's apparently the reason why workqueue was used in the first place! Because it offers just that functionality, except it has the stipulation which net_pkt handling doesn't adhere too.

All in all, I guess it's a good change, but worth being further optimized to minimize number of processing threads (their stacks) somehow. (Not giving +1 right away, as I didn't test these changes yet.)

mniestroj · 2021-05-05T20:02:15Z

k_fifo vs k_work time measurements seem suspicious to me and I honestly don't believe in 10% improvement. There is only one step (maybe two with CAN, but not sure) in networking TX packet path where this PR should make a difference. This is net_tc.c code, when CONFIG_NET_TC_TX_COUNT >= 1. So having results like:

k_work
[0] BK (1)	21922		5543071	103 us	[0->41->26->34=101 us]

k_fifo
[0] BK (1)	15079		3811118	94 us	[0->36->23->32=91 us]

seems wrong. IMO just 26 vs 23 shows the difference in k_work vs k_fifo. But then, something had to be wrong with measurement, as other numbers should theoretically be equal. Maybe measurements were not taken over multiple times?

I made some measurements with samples/net/sockets/echo_server and qemu_x86 with overlay-e1000.conf. As a result I get:

k_work
Avg TX net_pkt (709) time 41 us [0->19->6->13=38 us]

k_fifo
Avg TX net_pkt (709) time 39 us [0->19->4->13=36 us]

when taking average from multiple measurements in TX path. In case of RX, I see maybe 1us difference from total of 25us - too little to make any conclusions.

mniestroj · 2021-05-05T20:30:19Z

I do like net: Use k_fifo instead of k_work in RX and TX processing commit, as it just changes from workqueue threads to specialized threads consuming k_fifo. It also does come with some performance improvement.

I don't like ppp and 6locan changes however. Most importantly, new threads are added for no good (enough) reason, which results in higher RAM and FLASH usage. I tested PPP with nRF52 based project and it seems that RAM usage increases by 1216 bytes and FLASH usage increases by 276 bytes. I don't like it, as I believe 32kB of RAM was enough to run PPP with some cellular modem and simple app. 1216 is a big difference in that case. On the other hand PPP change isn't that bad if we take into account that fixing PPP implementation will allow us (propably) to drop this thread. The reason why it shouldn't be needed is that following comment:

zephyr/subsys/net/l2/ppp/fsm.c

Lines 537 to 545 in 126da28

 /* Do not call net_send_data() directly in order to make this 

  * thread run before the sending happens. If we call the 

  * net_send_data() from this thread, then in fast link (like 

  * when running inside QEMU) the reply might arrive before we 

  * have returned from this function. That is bad because the 

  * fsm would be in wrong state and the received pkt is dropped. 

  */ 

 k_work_init(net_pkt_work(pkt), ppp_pkt_send); 

 k_work_submit(net_pkt_work(pkt));

is simply wrong. Calling either k_work_submit or k_fifo_put can reschedule threads, so the original problem is still there, but possibly is not reproduced that often.

jukkar · 2021-05-06T07:00:19Z

k_fifo vs k_work time measurements seem suspicious to me and I honestly don't believe in 10% improvement. There is only one step (maybe two with CAN, but not sure) in networking TX packet path where this PR should make a difference. This is net_tc.c code, when CONFIG_NET_TC_TX_COUNT >= 1. So having results like:
k_work
[0] BK (1)	21922		5543071	103 us	[0->41->26->34=101 us]

k_fifo
[0] BK (1)	15079		3811118	94 us	[0->36->23->32=91 us]
seems wrong. IMO just 26 vs 23 shows the difference in k_work vs k_fifo. But then, something had to be wrong with measurement, as other numbers should theoretically be equal. Maybe measurements were not taken over multiple times?

The number of packets is shown there, for k_work it was 21922 and for fifo 15079.

I made some measurements with samples/net/sockets/echo_server and qemu_x86 with overlay-e1000.conf. As a result I get:
k_work
Avg TX net_pkt (709) time 41 us [0->19->6->13=38 us]

k_fifo
Avg TX net_pkt (709) time 39 us [0->19->4->13=36 us]
when taking average from multiple measurements in TX path. In case of RX, I see maybe 1us difference from total of 25us - too little to make any conclusions.

You do not have any TC threads in your tests (note that the default number of TX threads is now 0), so you will not see any difference in your test between k_work and k_fifo.

mniestroj · 2021-05-06T07:26:35Z

Maybe measurements were not taken over multiple times?

The number of packets is shown there, for k_work it was 21922 and for fifo 15079.

Okay. I used around 700 (and sometimes 7000) which is equivalent of 100 (1000) runs of echo-client. But then I restarted qemu_x86 machine and retried the whole test multiple times and have noticed that results are different (easily by several %). Even running 21922 vs 15079 would make a big difference. So I tried to average from those multiple "cold-start" runs, always using the same number echo-client runs. I also removed all logs, so they don't interfere.

You do not have any TC threads in your tests (note that the default number of TX threads is now 0), so you will not see any difference in your test between k_work and k_fifo.

I do have TX TC thread. I forgot to mention, that I specifically added CONFIG_NET_TC_TX_COUNT=1. Without that option, there was no difference at all (which is expected), i.e. there was no step that takes 6us in case of k_work and 4us in case of k_fifo.

rlubos

The generic net changes LGTM.

jukkar · 2021-05-07T07:03:29Z

Fixed merge conflict

tsvehagen

ESP changes LGTM

carlescufi · 2021-05-07T11:11:37Z

@jukkar please rebase

Following commits will remove k_work from net_pkt, so convert 6locan L2 to use k_fifo between application and TX thread, and driver and RX error handler. Signed-off-by: Jukka Rissanen <[email protected]>

Following commits will remove k_work from net_pkt, so convert PPP L2 to use k_fifo when sending PPP data. Signed-off-by: Jukka Rissanen <[email protected]>

The k_work handler cannot manipulate the used k_work. This means that it is not easy to cleanup the net_pkt because it contains k_work in it. Because of this, use k_fifo instead between RX thread and network driver, and between application and TX thread. A echo-server/client run with IPv4 and UDP gave following results: Using k_work ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 21922 5543071 103 us [0->41->26->34=101 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (0) 0 0 - [1] BE (0) 21925 6039151 97 us [0->21->16->37->20=94 us] Using k_fifo ------------ TX traffic class statistics: TC Priority Sent pkts bytes time [0] BK (1) 15079 3811118 94 us [0->36->23->32=91 us] [1] BE (0) 0 0 - RX traffic class statistics: TC Priority Recv pkts bytes time [0] BK (1) 0 0 - [1] BE (0) 15073 4150947 79 us [0->17->12->32->14=75 us] So using k_fifo gives about 10% better performance with same workload. Fixes zephyrproject-rtos#34690 Signed-off-by: Jukka Rissanen <[email protected]>

pfalcon · 2021-05-07T15:49:51Z

For reference, I tested this (with master of f05ea67) on frdm_k64f with dumb_http_server/big_http_download samples, both work well. In my testing, big_http_download has higher throughput without this patch, but that's not scientific, as I downloaded different amounts of data each time, and it's affected by Internet connection anyway.

Trying to test on qemu_x86, I hit #34964 (not related to this patch, present in master too), so didn't look into it further.

jukkar requested review from pabigot, mniestroj, rlubos and mnkp April 29, 2021 13:18

jukkar requested review from pfalcon and tbursztyka as code owners April 29, 2021 13:18

github-actions bot added area: API Changes to public APIs area: Networking labels Apr 29, 2021

jukkar added this to the v2.6.0 milestone Apr 29, 2021

pabigot approved these changes Apr 29, 2021

View reviewed changes

zephyrbot requested a review from mengxianglinx April 29, 2021 13:48

zephyrbot assigned jukkar Apr 29, 2021

mnkp reviewed Apr 30, 2021

View reviewed changes

subsys/net/ip/net_tc.c Outdated Show resolved Hide resolved

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from f6fd624 to 4c184b7 Compare April 30, 2021 08:41

jukkar requested a review from alexanderwachter as a code owner April 30, 2021 08:41

jukkar requested a review from tsvehagen April 30, 2021 08:56

mnkp reviewed May 5, 2021

View reviewed changes

subsys/net/l2/canbus/6locan.c Show resolved Hide resolved

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from 4c184b7 to 400bc50 Compare May 5, 2021 11:06

mniestroj reviewed May 5, 2021

View reviewed changes

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from 400bc50 to ec308c3 Compare May 5, 2021 14:40

rlubos reviewed May 5, 2021

View reviewed changes

subsys/net/ip/net_tc.c Outdated Show resolved Hide resolved

subsys/net/ip/net_tc.c Outdated Show resolved Hide resolved

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from ec308c3 to 966e987 Compare May 5, 2021 16:29

mnkp approved these changes May 5, 2021

View reviewed changes

mniestroj mentioned this pull request May 6, 2021

drivers: wifi: esp: stop using pkt->work in TX path #34886

Merged

rlubos approved these changes May 6, 2021

View reviewed changes

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from 966e987 to ac009f8 Compare May 7, 2021 07:03

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from ac009f8 to 0fc857c Compare May 7, 2021 07:50

tsvehagen approved these changes May 7, 2021

View reviewed changes

jukkar added 3 commits May 7, 2021 14:14

net: 6locan: Convert to use k_fifo instead of k_work

dc7d0b8

Following commits will remove k_work from net_pkt, so convert 6locan L2 to use k_fifo between application and TX thread, and driver and RX error handler. Signed-off-by: Jukka Rissanen <[email protected]>

net: ppp: Convert to use k_fifo instead of k_work

a6ef845

Following commits will remove k_work from net_pkt, so convert PPP L2 to use k_fifo when sending PPP data. Signed-off-by: Jukka Rissanen <[email protected]>

jukkar force-pushed the bug-34690-k_work-issue-in-net branch from 0fc857c to 2f435fc Compare May 7, 2021 11:14

galak merged commit bcdc762 into zephyrproject-rtos:master May 7, 2021

jukkar deleted the bug-34690-k_work-issue-in-net branch May 7, 2021 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: Use k_fifo instead of k_work in RX and TX processing #34703

net: Use k_fifo instead of k_work in RX and TX processing #34703

jukkar commented Apr 29, 2021

pabigot left a comment

jukkar commented Apr 30, 2021

jukkar commented Apr 30, 2021

tsvehagen commented May 3, 2021

jukkar commented May 5, 2021

mnkp left a comment

jukkar commented May 5, 2021

mniestroj May 5, 2021

jukkar May 5, 2021

mniestroj May 5, 2021

jukkar May 5, 2021

mniestroj May 5, 2021

jukkar May 5, 2021

mniestroj commented May 5, 2021

jukkar commented May 5, 2021

pfalcon commented May 5, 2021

mniestroj commented May 5, 2021

mniestroj commented May 5, 2021

jukkar commented May 6, 2021 •

edited

Loading

mniestroj commented May 6, 2021

rlubos left a comment

jukkar commented May 7, 2021

tsvehagen left a comment

carlescufi commented May 7, 2021

pfalcon commented May 7, 2021

net: Use k_fifo instead of k_work in RX and TX processing #34703

net: Use k_fifo instead of k_work in RX and TX processing #34703

Conversation

jukkar commented Apr 29, 2021

pabigot left a comment

Choose a reason for hiding this comment

jukkar commented Apr 30, 2021

jukkar commented Apr 30, 2021

tsvehagen commented May 3, 2021

jukkar commented May 5, 2021

mnkp left a comment

Choose a reason for hiding this comment

jukkar commented May 5, 2021

mniestroj May 5, 2021

Choose a reason for hiding this comment

jukkar May 5, 2021

Choose a reason for hiding this comment

mniestroj May 5, 2021

Choose a reason for hiding this comment

jukkar May 5, 2021

Choose a reason for hiding this comment

mniestroj May 5, 2021

Choose a reason for hiding this comment

jukkar May 5, 2021

Choose a reason for hiding this comment

mniestroj commented May 5, 2021

jukkar commented May 5, 2021

pfalcon commented May 5, 2021

mniestroj commented May 5, 2021

mniestroj commented May 5, 2021

jukkar commented May 6, 2021 • edited Loading

mniestroj commented May 6, 2021

rlubos left a comment

Choose a reason for hiding this comment

jukkar commented May 7, 2021

tsvehagen left a comment

Choose a reason for hiding this comment

carlescufi commented May 7, 2021

pfalcon commented May 7, 2021

jukkar commented May 6, 2021 •

edited

Loading