Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test commit for PR2 #3

Closed
wants to merge 1 commit into from
Closed

Test commit for PR2 #3

wants to merge 1 commit into from

Conversation

carlescufi
Copy link
Owner

Signed-off-by: Carles Cufi [email protected]

Signed-off-by: Carles Cufi <[email protected]>
@carlescufi carlescufi closed this May 4, 2017
carlescufi pushed a commit that referenced this pull request Mar 19, 2018
The scheduler exposed two APIs to do the same thing:
_add_thread_to_ready_q() was a low level primitive that in most cases
was wrapped by _ready_thread(), which also (1) checks that the thread
_is_ready() or exits, (2) flags the thread as "started" to handle the
case of a thread running for the first time out of a waitq timeout,
and (3) signals a logger event.

As it turns out, all existing usage was already checking case #1.
Case #2 can be better handled in the timeout resume path instead of on
every call.  And case #3 was probably wrong to have been skipping
anyway (there were paths that could make a thread runnable without
logging).

Now _add_thread_to_ready_q() is an internal scheduler API, as it
probably always should have been.

This also moves some asserts from the inline _ready_thread() wrapper
to the underlying true function for code size reasons, otherwise the
extra use of the inline added by this patch blows past code size
limits on Quark D2000.

Signed-off-by: Andy Ross <[email protected]>
carlescufi pushed a commit that referenced this pull request Jun 24, 2020
This makes the gatt metrics also available for
gatt write-without-rsp-cb so it now prints the rate of each write:

uart:~$ gatt write-without-response-cb 1e ff 10 10
Write #1: 16 bytes (0 bps)
Write #2: 32 bytes (3445948416 bps)
Write #3: 48 bytes (2596929536 bps)
Write #4: 64 bytes (6400 bps)
Write #5: 80 bytes (8533 bps)
Write #6: 96 bytes (10666 bps)
Write #7: 112 bytes (8533 bps)
Write #8: 128 bytes (9955 bps)
Write #9: 144 bytes (11377 bps)
Write #10: 160 bytes (7680 bps)
Write #11: 176 bytes (8533 bps)
Write #12: 192 bytes (9386 bps)
Write Complete (err 0)
Write #13: 208 bytes (8533 bps)
Write #14: 224 bytes (9244 bps)
Write #15: 240 bytes (9955 bps)
Write zephyrproject-rtos#16: 256 bytes (8000 bps)

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
carlescufi pushed a commit that referenced this pull request Nov 16, 2020
The _ldiv5() is an optimized divide-by-5 function that is smaller and
faster than the generic libgcc implementation.

Yet it can be made even smaller and faster with this replacement
implementation based on a reciprocal multiplication plus some tricks.

For example, here's the assembly from the original code on ARM:

_ldiv5:
        ldr     r3, [r0]
        movw    ip, zephyrproject-rtos#52429
        ldr     r1, [r0, #4]
        movt    ip, 52428
        adds    r3, r3, #2
        push    {r4, r5, r6, r7, lr}
        mov     lr, #0
        adc     r1, r1, lr
        adds    r2, lr, lr
        umull   r7, r6, ip, r1
        lsr     r6, r6, #2
        adc     r7, r6, r6
        adds    r2, r2, r2
        adc     r7, r7, r7
        adds    r2, r2, lr
        adc     r7, r7, r6
        subs    r3, r3, r2
        sbc     r7, r1, r7
        lsr     r2, r3, #3
        orr     r2, r2, r7, lsl zephyrproject-rtos#29
        umull   r2, r1, ip, r2
        lsr     r2, r1, #2
        lsr     r7, r1, zephyrproject-rtos#31
        lsl     r1, r2, #3
        adds    r4, lr, r1
        adc     r5, r6, r7
        adds    r2, r1, r1
        adds    r2, r2, r2
        adds    r2, r2, r1
        subs    r2, r3, r2
        umull   r3, r2, ip, r2
        lsr     r2, r2, #2
        adds    r4, r4, r2
        adc     r5, r5, #0
        strd    r4, [r0]
        pop     {r4, r5, r6, r7, pc}

And here's the resulting assembly with this commit applied:

_ldiv5:
        push    {r4, r5, r6, r7}
        movw    r4, zephyrproject-rtos#13107
        ldr     r6, [r0]
        movt    r4, 13107
        ldr     r1, [r0, #4]
        mov     r3, #0
        umull   r6, r7, r6, r4
        add     r2, r4, r4, lsl #1
        umull   r4, r5, r1, r4
        adds    r1, r6, r2
        adc     r2, r7, r2
        adds    ip, r6, r4
        adc     r1, r7, r5
        adds    r2, ip, r2
        adc     r2, r1, r3
        adds    r2, r4, r2
        adc     r3, r5, r3
        strd    r2, [r0]
        pop     {r4, r5, r6, r7}
        bx      lr

So we're down to 20 instructions from 36 initially, with only 2 umull
instructions instead of 3, and slightly smaller stack footprint.

Signed-off-by: Nicolas Pitre <[email protected]>
carlescufi added a commit that referenced this pull request Dec 2, 2022
This patch reworks how fragments are handled in the net_buf
infrastructure.

In particular, it removes the union around the node and frags members in
the main net_buf structure. This is done so that both can be used at the
same time, at a cost of 4 bytes per net_buf instance.
This implies that the layout of net_buf instances changes whenever being
inserted into a queue (fifo or lifo) or a linked list (slist).

Until now, this is what happened when enqueueing a net_buf with frags in
a queue or linked list:

1.1 Before enqueueing:

 +--------+      +--------+      +--------+
 |#1  node|\     |#2  node|\     |#3  node|\
 |        | \    |        | \    |        | \
 | frags  |------| frags  |------| frags  |------NULL
 +--------+      +--------+      +--------+

net_buf #1 has 2 fragments, net_bufs #2 and #3. Both the node and frags
pointers (they are the same, since they are unioned) point to the next
fragment.

1.2 After enqueueing:

 +--------+      +--------+      +--------+      +--------+      +--------+
 |q/slist |------|#1  node|------|#2  node|------|#3  node|------|q/slist |
 |node    |      | *flag  | /    | *flag  | /    |        | /    |node    |
 |        |      | frags  |/     | frags  |/     | frags  |/     |        |
 +--------+      +--------+      +--------+      +--------+      +--------+

When enqueing a net_buf (in this case #1) that contains fragments, the
current net_buf implementation actually enqueues all the fragments (in
this case #2 and #3) as actual queue/slist items, since node and frags
are one and the same in memory. This makes the enqueuing operation
expensive and it makes it impossible to atomically dequeue. The `*flag`
notation here means that the `flags` member has been set to
`NET_BUF_FRAGS` in order to be able to reconstruct the frags pointers
when dequeuing.

After this patch, the layout changes considerably:

2.1 Before enqueueing:

 +--------+       +--------+       +--------+
 |#1  node|--NULL |#2  node|--NULL |#3  node|--NULL
 |        |       |        |       |        |
 | frags  |-------| frags  |-------| frags  |------NULL
 +--------+       +--------+       +--------+

This is very similar to 1.1, except that now node and frags are
different pointers, so node is just set to NULL.

2.2 After enqueueing:

 +--------+       +--------+       +--------+
 |q/slist |-------|#1  node|-------|q/slist |
 |node    |       |        |       |node    |
 |        |       | frags  |       |        |
 +--------+       +--------+       +--------+
                      |            +--------+       +--------+
                      |            |#2  node|--NULL |#3  node|--NULL
                      |            |        |       |        |
                      +------------| frags  |-------| frags  |------NULL
                                   +--------+       +--------+

When enqueuing net_buf #1, now we only enqueue that very item, instead
of enqueing the frags as well, since now node and frags are separate
pointers. This simplifies the operation and makes it atomic.

Resolves zephyrproject-rtos#52718.

Signed-off-by: Carles Cufi <[email protected]>
carlescufi added a commit that referenced this pull request Dec 14, 2022
This patch reworks how fragments are handled in the net_buf
infrastructure.

In particular, it removes the union around the node and frags members in
the main net_buf structure. This is done so that both can be used at the
same time, at a cost of 4 bytes per net_buf instance.
This implies that the layout of net_buf instances changes whenever being
inserted into a queue (fifo or lifo) or a linked list (slist).

Until now, this is what happened when enqueueing a net_buf with frags in
a queue or linked list:

1.1 Before enqueueing:

 +--------+      +--------+      +--------+
 |#1  node|\     |#2  node|\     |#3  node|\
 |        | \    |        | \    |        | \
 | frags  |------| frags  |------| frags  |------NULL
 +--------+      +--------+      +--------+

net_buf #1 has 2 fragments, net_bufs #2 and #3. Both the node and frags
pointers (they are the same, since they are unioned) point to the next
fragment.

1.2 After enqueueing:

 +--------+      +--------+      +--------+      +--------+      +--------+
 |q/slist |------|#1  node|------|#2  node|------|#3  node|------|q/slist |
 |node    |      | *flag  | /    | *flag  | /    |        | /    |node    |
 |        |      | frags  |/     | frags  |/     | frags  |/     |        |
 +--------+      +--------+      +--------+      +--------+      +--------+

When enqueing a net_buf (in this case #1) that contains fragments, the
current net_buf implementation actually enqueues all the fragments (in
this case #2 and #3) as actual queue/slist items, since node and frags
are one and the same in memory. This makes the enqueuing operation
expensive and it makes it impossible to atomically dequeue. The `*flag`
notation here means that the `flags` member has been set to
`NET_BUF_FRAGS` in order to be able to reconstruct the frags pointers
when dequeuing.

After this patch, the layout changes considerably:

2.1 Before enqueueing:

 +--------+       +--------+       +--------+
 |#1  node|--NULL |#2  node|--NULL |#3  node|--NULL
 |        |       |        |       |        |
 | frags  |-------| frags  |-------| frags  |------NULL
 +--------+       +--------+       +--------+

This is very similar to 1.1, except that now node and frags are
different pointers, so node is just set to NULL.

2.2 After enqueueing:

 +--------+       +--------+       +--------+
 |q/slist |-------|#1  node|-------|q/slist |
 |node    |       |        |       |node    |
 |        |       | frags  |       |        |
 +--------+       +--------+       +--------+
                      |            +--------+       +--------+
                      |            |#2  node|--NULL |#3  node|--NULL
                      |            |        |       |        |
                      +------------| frags  |-------| frags  |------NULL
                                   +--------+       +--------+

When enqueuing net_buf #1, now we only enqueue that very item, instead
of enqueing the frags as well, since now node and frags are separate
pointers. This simplifies the operation and makes it atomic.

Resolves zephyrproject-rtos#52718.

Signed-off-by: Carles Cufi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant