Second commit for a PR test #2

carlescufi · 2017-05-03T23:19:42Z

Signed-off-by: Carles Cufi [email protected]

Signed-off-by: Carles Cufi <[email protected]>

Case #1: If ACK received and our retransmit (i.e. unacked) queue is empty, it's error. It's incorrect because TCP requires ACK to set for every packet of established connection. For example, if we didn't send anything to peer, but it sends us new data, it will reuse the older ack number. It doesn't acknowledge anything new on our side, but it's not an error in any way. Case #2: If retransmit queue is only partially acknowledged, it's an error. Consider that we have 2 packets in the queue, with sequence numbers (inclusive) 100-199 and 200-399. There's nothing wrong if we receive ACK with number 200 - it just acknowledges first packet, we can remove and finish processing. Second packet remains in the queue to be acknowledged later. Fixes: zephyrproject-rtos#5504 Signed-off-by: Paul Sokolovsky <[email protected]>

The scheduler exposed two APIs to do the same thing: _add_thread_to_ready_q() was a low level primitive that in most cases was wrapped by _ready_thread(), which also (1) checks that the thread _is_ready() or exits, (2) flags the thread as "started" to handle the case of a thread running for the first time out of a waitq timeout, and (3) signals a logger event. As it turns out, all existing usage was already checking case #1. Case #2 can be better handled in the timeout resume path instead of on every call. And case #3 was probably wrong to have been skipping anyway (there were paths that could make a thread runnable without logging). Now _add_thread_to_ready_q() is an internal scheduler API, as it probably always should have been. This also moves some asserts from the inline _ready_thread() wrapper to the underlying true function for code size reasons, otherwise the extra use of the inline added by this patch blows past code size limits on Quark D2000. Signed-off-by: Andy Ross <[email protected]>

Currently, the free block bitmap is roughly 4 times larger than it needs to, wasting memory. Let's assume maxsz = 128, minsz = 8 and n_max = 40. Z_MPOOL_LVLS(128, 8) returns 3. The block size for level #0 is 128, the block size for level #1 is 128/4 = 32, and the block size for level #2 is 32/4 = 8. Hence levels 0, 1, and 2 for a total of 3 levels. So far so good. Now let's look at Z_MPOOL_LBIT_WORDS(). We get: Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 0) = ((40 << 0) + 31) / 32 = 2 Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 1) = ((40 << 2) + 31) / 32 = 5 Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 2) = ((40 << 4) + 31) / 32 = 20 None of those are < 2 so Z_MPOOL_LBIT_WORDS() takes the results from Z_MPOOL_LBIT_WORDS_UNCLAMPED(). Finally, let's look at _MPOOL_BITS_SIZE(. It sums all possible levels with Z_MPOOL_LBIT_BYTES() which is: #define Z_MPOOL_LBIT_BYTES(maxsz, minsz, l, n_max) \ (Z_MPOOL_LVLS((maxsz), (minsz)) >= (l) ? \ 4 * Z_MPOOL_LBIT_WORDS((n_max), l) : 0) Or given what we already have: Z_MPOOL_LBIT_BYTES(128, 8, 0, 40) = (3 >= 0) ? 4 * 2 : 0 = 8 Z_MPOOL_LBIT_BYTES(128, 8, 1, 40) = (3 >= 1) ? 4 * 5 : 0 = 20 Z_MPOOL_LBIT_BYTES(128, 8, 2, 40) = (3 >= 2) ? 4 * 20 : 0 = 80 Z_MPOOL_LBIT_BYTES(128, 8, 3, 40) = (3 >= 3) ? 4 * ?? Wait... we're missing this one: Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 3) = ((40 << 6) + 31) / 32 = 80 then: Z_MPOOL_LBIT_BYTES(128, 8, 3, 40) = (3 >= 3) ? 4 * 80 : 0 = 320 Further levels yeld (3 >= 4), (3 >= 5), etc. so they're all false and produce 0. So this means that we're statically allocating 428 bytes to the bitmap when clearly only the first 3 Z_MPOOL_LBIT_BYTES() results for the corresponding 3 levels that we have should be summed e.g. only 108 bytes. Here the code logic gets confused between level numbers and the number levels, hence the extra allocation which happens to be exponential. Signed-off-by: Nicolas Pitre <[email protected]>

Fix Kconfig conditional include of Minimum Channels Used and Channel Selection Algorithm #2. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

This makes the gatt metrics also available for gatt write-without-rsp-cb so it now prints the rate of each write: uart:~$ gatt write-without-response-cb 1e ff 10 10 Write #1: 16 bytes (0 bps) Write #2: 32 bytes (3445948416 bps) Write #3: 48 bytes (2596929536 bps) Write #4: 64 bytes (6400 bps) Write #5: 80 bytes (8533 bps) Write #6: 96 bytes (10666 bps) Write #7: 112 bytes (8533 bps) Write #8: 128 bytes (9955 bps) Write #9: 144 bytes (11377 bps) Write #10: 160 bytes (7680 bps) Write #11: 176 bytes (8533 bps) Write #12: 192 bytes (9386 bps) Write Complete (err 0) Write #13: 208 bytes (8533 bps) Write #14: 224 bytes (9244 bps) Write #15: 240 bytes (9955 bps) Write zephyrproject-rtos#16: 256 bytes (8000 bps) Signed-off-by: Luiz Augusto von Dentz <[email protected]>

The _ldiv5() is an optimized divide-by-5 function that is smaller and faster than the generic libgcc implementation. Yet it can be made even smaller and faster with this replacement implementation based on a reciprocal multiplication plus some tricks. For example, here's the assembly from the original code on ARM: _ldiv5: ldr r3, [r0] movw ip, zephyrproject-rtos#52429 ldr r1, [r0, #4] movt ip, 52428 adds r3, r3, #2 push {r4, r5, r6, r7, lr} mov lr, #0 adc r1, r1, lr adds r2, lr, lr umull r7, r6, ip, r1 lsr r6, r6, #2 adc r7, r6, r6 adds r2, r2, r2 adc r7, r7, r7 adds r2, r2, lr adc r7, r7, r6 subs r3, r3, r2 sbc r7, r1, r7 lsr r2, r3, #3 orr r2, r2, r7, lsl zephyrproject-rtos#29 umull r2, r1, ip, r2 lsr r2, r1, #2 lsr r7, r1, zephyrproject-rtos#31 lsl r1, r2, #3 adds r4, lr, r1 adc r5, r6, r7 adds r2, r1, r1 adds r2, r2, r2 adds r2, r2, r1 subs r2, r3, r2 umull r3, r2, ip, r2 lsr r2, r2, #2 adds r4, r4, r2 adc r5, r5, #0 strd r4, [r0] pop {r4, r5, r6, r7, pc} And here's the resulting assembly with this commit applied: _ldiv5: push {r4, r5, r6, r7} movw r4, zephyrproject-rtos#13107 ldr r6, [r0] movt r4, 13107 ldr r1, [r0, #4] mov r3, #0 umull r6, r7, r6, r4 add r2, r4, r4, lsl #1 umull r4, r5, r1, r4 adds r1, r6, r2 adc r2, r7, r2 adds ip, r6, r4 adc r1, r7, r5 adds r2, ip, r2 adc r2, r1, r3 adds r2, r4, r2 adc r3, r5, r3 strd r2, [r0] pop {r4, r5, r6, r7} bx lr So we're down to 20 instructions from 36 initially, with only 2 umull instructions instead of 3, and slightly smaller stack footprint. Signed-off-by: Nicolas Pitre <[email protected]>

Added function to calculate channel identifier value required for Channel Selection Algorithm #2. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

The fatal log now contains - Trap type in human readable representation - Integer registers visible to the program when trap was taken - Special register values such as PC and PSR - Backtrace with PC and SP If CONFIG_EXTRA_EXCEPTION_INFO is enabled, then all the above is logged. If not, only the special registers are logged. The format is inspired by the GRMON debug monitor and TSIM simulator. A quick guide on how to use the values is in fatal.c. It now looks like this: E: tt = 0x02, illegal_instruction E: E: INS LOCALS OUTS GLOBALS E: 0: 00000000 f3900fc 40007c50 00000000 E: 1: 00000000 40004bf0 40008d30 40008c00 E: 2: 00000000 40004bf4 40008000 00000003 E: 3: 40009158 00000000 40009000 00000002 E: 4: 40008fa8 40003c00 40008fa8 00000008 E: 5: 40009000 f3400fc0 00000000 00000080 E: 6: 4000a1f8 40000050 4000a190 00000000 E: 7: 40002308 00000000 40001fb8 000000c1 E: E: psr: f30000c7 wim: 00000008 tbr: 40000020 y: 00000000 E: pc: 4000a1f4 npc: 4000a1f8 E: E: pc sp E: #0 4000a1f4 4000a190 E: #1 40002308 4000a1f8 E: #2 40003b24 4000a258 Signed-off-by: Martin Åberg <[email protected]>

Reorder Channel Selection Algorithm #2 and Minimum Number of Used Channels Procedure bits. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Implement the functions required to calculate the SubEvent 1 and SubEvent n mapped channel indices. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Added Bluetooth Specification references to the implementation of Channel Selection algorithm #2 in the Controller. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Remove explicit disable of Channel Selection Algorithm #2 in the mesh tests that use Extended Advertising. Fixes zephyrproject-rtos#39188. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

This patch reworks how fragments are handled in the net_buf infrastructure. In particular, it removes the union around the node and frags members in the main net_buf structure. This is done so that both can be used at the same time, at a cost of 4 bytes per net_buf instance. This implies that the layout of net_buf instances changes whenever being inserted into a queue (fifo or lifo) or a linked list (slist). Until now, this is what happened when enqueueing a net_buf with frags in a queue or linked list: 1.1 Before enqueueing: +--------+ +--------+ +--------+ |#1 node|\ |#2 node|\ |#3 node|\ | | \ | | \ | | \ | frags |------| frags |------| frags |------NULL +--------+ +--------+ +--------+ net_buf #1 has 2 fragments, net_bufs #2 and #3. Both the node and frags pointers (they are the same, since they are unioned) point to the next fragment. 1.2 After enqueueing: +--------+ +--------+ +--------+ +--------+ +--------+ |q/slist |------|#1 node|------|#2 node|------|#3 node|------|q/slist | |node | | *flag | / | *flag | / | | / |node | | | | frags |/ | frags |/ | frags |/ | | +--------+ +--------+ +--------+ +--------+ +--------+ When enqueing a net_buf (in this case #1) that contains fragments, the current net_buf implementation actually enqueues all the fragments (in this case #2 and #3) as actual queue/slist items, since node and frags are one and the same in memory. This makes the enqueuing operation expensive and it makes it impossible to atomically dequeue. The `*flag` notation here means that the `flags` member has been set to `NET_BUF_FRAGS` in order to be able to reconstruct the frags pointers when dequeuing. After this patch, the layout changes considerably: 2.1 Before enqueueing: +--------+ +--------+ +--------+ |#1 node|--NULL |#2 node|--NULL |#3 node|--NULL | | | | | | | frags |-------| frags |-------| frags |------NULL +--------+ +--------+ +--------+ This is very similar to 1.1, except that now node and frags are different pointers, so node is just set to NULL. 2.2 After enqueueing: +--------+ +--------+ +--------+ |q/slist |-------|#1 node|-------|q/slist | |node | | | |node | | | | frags | | | +--------+ +--------+ +--------+ | +--------+ +--------+ | |#2 node|--NULL |#3 node|--NULL | | | | | +------------| frags |-------| frags |------NULL +--------+ +--------+ When enqueuing net_buf #1, now we only enqueue that very item, instead of enqueing the frags as well, since now node and frags are separate pointers. This simplifies the operation and makes it atomic. Resolves zephyrproject-rtos#52718. Signed-off-by: Carles Cufi <[email protected]>

Add additional custom LE Channel Selection #2 tests to cover event and subevent mapping. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Second commit for a PR test

c2bc37e

Signed-off-by: Carles Cufi <[email protected]>

carlescufi merged this pull request into bluetooth May 3, 2017

carlescufi deleted the test_pr_2 branch May 4, 2017 09:55

carlescufi pushed a commit that referenced this pull request May 14, 2021

Bluetooth: controller: Reorder feature bits

323b8ef

Reorder Channel Selection Algorithm #2 and Minimum Number of Used Channels Procedure bits. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second commit for a PR test #2

Second commit for a PR test #2

carlescufi commented May 3, 2017

Second commit for a PR test #2

Second commit for a PR test #2

Conversation

carlescufi commented May 3, 2017