Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/gcoap: use sock_async and events #13386

Merged
merged 5 commits into from
Feb 23, 2020
Merged

Conversation

kb2ma
Copy link
Member

@kb2ma kb2ma commented Feb 17, 2020

Contribution description

From its inception gcoap has implemented asynchronous message processing. However, this implementation has included a custom msg_t queue handler for timeouts, as well as stack specific handling to break out of a blocking call to sock_udp_recv(). The new sock_async and event queue modules implement generic message queue processing and timeout events that can supplant gcoap's custom solutions. This PR performs that replacement with no change in gcoap's feature set.

Abstraction has a cost, and we measured the increase in size of the executable and RAM use. For the gcoap example built for a samr21-xpro:

  • text +896 bytes
  • data +32 bytes
  • bss +8 bytes

We also saw RAM increase by 32 bytes via ps command.

Testing procedure

Run the gcoap example or your favorite app. There should not be any change in functionality for client request handling or server response handling.

Issues/PRs references

N/A

@kb2ma kb2ma added the Area: CoAP Area: Constrained Application Protocol implementations label Feb 17, 2020
@kb2ma kb2ma added this to the Release 2020.04 milestone Feb 17, 2020
@kb2ma kb2ma requested a review from miri64 February 17, 2020 03:34
@miri64
Copy link
Member

miri64 commented Feb 19, 2020

Abstraction has a cost, and we measured the increase in size of the executable and RAM use. For the gcoap example built for a samr21-xpro:

* text +896 bytes

* data +32 bytes

* bss +8 bytes

We also saw RAM increase by 32 bytes via ps command.

Any idea where this increase is coming from specifically? If it is coming from event it might be a benefit actually, when an application uses that on top of gcoap.

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise this looks good. Code is mostly just moved around and usages of msg replaced with event. Will do some testing after lunch break.

Comment on lines +220 to +246
static void _on_resp_timeout(void *arg) {
gcoap_request_memo_t *memo = (gcoap_request_memo_t *)arg;

/* no retries remaining */
if ((memo->send_limit == GCOAP_SEND_LIMIT_NON) || (memo->send_limit == 0)) {
_expire_request(memo);
}
/* reduce retries remaining, double timeout and resend */
else {
memo->send_limit--;
#ifdef CONFIG_GCOAP_NO_RETRANS_BACKOFF
unsigned i = 0;
#else
unsigned i = COAP_MAX_RETRANSMIT - memo->send_limit;
#endif
uint32_t timeout = ((uint32_t)COAP_ACK_TIMEOUT << i) * US_PER_SEC;
#if COAP_RANDOM_FACTOR_1000 > 1000
uint32_t end = ((uint32_t)TIMEOUT_RANGE_END << i) * US_PER_SEC;
timeout = random_uint32_range(timeout, end);
#endif
event_timeout_set(&memo->resp_evt_tmout, timeout);

ssize_t bytes = sock_udp_send(&_sock, memo->msg.data.pdu_buf,
memo->msg.data.pdu_len, &memo->remote_ep);
if (bytes <= 0) {
DEBUG("gcoap: sock resend failed: %d\n", (int)bytes);
_expire_request(memo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This was adapted from _event_loop's GCOAP_MSG_TYPE_TIMEOUT case.

@miri64 miri64 added Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines labels Feb 19, 2020
@miri64
Copy link
Member

miri64 commented Feb 19, 2020

Tested as proposed in Testing Procedures:

  • flashed examples/gcoap to an iotlab-m3 and a samr21-xpro and send a request from one to the other
coap get -c fe80::589d:9386:2208:6579 5683 /.well-known/core
2020-02-19 13:34:12,795 #  coap get -c fe80::589d:9386:2208:6579 5683 /.well-known/core
2020-02-19 13:34:12,796 # gcoap_cli: sending msg ID 44298, 23 bytes
> 2020-02-19 13:34:12,809 #  gcoap: response Success, code 2.05, 46 bytes
2020-02-19 13:34:12,809 # </cli/stats>;ct=0;rt="count";obs,</riot/board>
coap get fe80::589d:9386:2208:6579 5683 /.well-known/core
2020-02-19 13:34:16,826 # coap get fe80::589d:9386:2208:6579 5683 /.well-known/core
2020-02-19 13:34:16,828 # gcoap_cli: sending msg ID 44299, 23 bytes
> 2020-02-19 13:34:16,842 #  gcoap: response Success, code 2.05, 46 bytes
2020-02-19 13:34:16,843 # </cli/stats>;ct=0;rt="count";obs,</riot/board>
  • to test retransmission timeout I sent a CON message to a different port than 5683:
coap get -c fe80::589d:9386:2208:6579 5684 /.well-known/core
2020-02-19 13:34:45,928 # coap get -c fe80::589d:9386:2208:6579 5684 /.well-known/core
2020-02-19 13:34:45,943 # gcoap_cli: sending msg ID 44300, 23 bytes
> 2020-02-19 13:35:57,491 #  gcoap: timeout for msg ID 44300

Timings between messages still look correct, however there are now 6 tries (as posed to just 5 in master).
image
On master

2020-02-19 13:42:49,128 # coap get -c fe80::589d:9386:2208:6578 5684 /.well-known/core
2020-02-19 13:42:49,129 # gcoap_cli: sending msg ID 44300, 23 bytes
> 2020-02-19 13:44:12,130 #  gcoap: timeout for msg ID 44300

image

(that the address changes is I think unrelated and due to #13362 being merged in the meantime ;-))

@kb2ma
Copy link
Member Author

kb2ma commented Feb 20, 2020

Any idea where this increase is coming from specifically?

The graphic below shows the differences in make info-objsize. The first section is text sizes. Before is the master branch base used for After gcoap/sock_async branch.

before-after

@kb2ma
Copy link
Member Author

kb2ma commented Feb 20, 2020

Timings between messages still look correct, however there are now 6 tries (as posed to just 5 in master).

Are you sure this difference happens consistently? I tested several times on native and samr21-xpro and could not reproduce. In my tests, gcoap would send messages via tap to libcoap running on my workstation. I used ethos over USB for the samr21-xpro test. I set up libcoap's coap-server example to drop responses like this:

$ coap-server -l 1-100

I have not tested between two samr21-xpro using the radio as it looks like you did. I wonder if you are seeing a spurious multipath reception on the sniffer rather than an extra send -- which is the basis for my question above.

@miri64
Copy link
Member

miri64 commented Feb 20, 2020

Any idea where this increase is coming from specifically?

The graphic below shows the differences in make info-objsize. The first section is text sizes. Before is the master branch base used for After gcoap/sock_async branch.

Thanks for analyzing this, so the major part seems to come from the sock_async implementation itself.

I have not tested between two samr21-xpro using the radio as it looks like you did. I wonder if you are seeing a spurious multipath reception on the sniffer rather than an extra send -- which is the basis for my question above.

I'm using link-local addresses on the nodes, so multipath side-effects would be odd. It isn't a link-layer retransmission either, as can be seen in my screenshot. And yes, the 6 sends were reproducible. I'll test with your proposed setup to reconfirm.

@miri64
Copy link
Member

miri64 commented Feb 20, 2020

A question to your ethos setup: Do you have an application for this? Just replacing gnrc_netdev_defaults in examples/gcoap with ethos and trying to send to the link-local address of the tap interface seem to not be working as address resolution does not work for some reason I didn't look into yet.

@miri64
Copy link
Member

miri64 commented Feb 20, 2020

Between two native instances I also only get 5 transmissions. Let me reconfirm my findings from yesterday.

@miri64
Copy link
Member

miri64 commented Feb 20, 2020

I retried it for several times now but wasn't able to reproduce yesterdays results.. so maybe it was a fluke after all.

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK from my side.

@miri64 miri64 added Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Feb 20, 2020
@kb2ma kb2ma added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 20, 2020
@kb2ma
Copy link
Member Author

kb2ma commented Feb 20, 2020

A question to your ethos setup: Do you have an application for this?

My answer is going to drift off topic, but I'm glad you asked. I have some extra build scripts and make files to help with alternative builds like this. Rather than post them for gcoap, I have added similar ones to the gcoap native LwM2M client I'm working on.

Most of the files are in the client directory for that project. Start with build.sh (which is not in that directory for security reasons). See the wiki page with the content. Notice the build variations for native, ULA/ethos, and lwIP v6 and v4. I have set up the ULA/ethos variant to call Makefile.ula to work like the gnrc_border_router example, so ethos is setup/removed with the terminal. The ULA variant also excludes the 802.15.4 radio to save memory/executable space.

To get to your question, I also have running.md documentation to help me remember how to do the setup. For ULA, see the section Single board client via USB. Notice the routing setup for both the board and the workstation. I don't know if the setup is logical, but it works. ;-)

I originally created the ULA version when testing Wakaama, so it would fit on a samr21-xpro. I created the lwIP versions for use in environments required by my employer.

The larger point i wanted to make is that I think we need two things to help with these variations:

  1. A document that describes the many options for network setup. running.md could be a prototype. And this doesn't even include use of CDC-ECM, SLIP, SLAAC and all the other IPv6 variations I'm only dimly aware of.

  2. Maybe some orthogonal way to have templates for Makefiles based on the network scenario, like my Makefile vs. Makefile.ula vs. lwIP specifics.

@kb2ma
Copy link
Member Author

kb2ma commented Feb 20, 2020

Murdock has a couple of issues I need to address.

@miri64
Copy link
Member

miri64 commented Feb 20, 2020

This might be of tangential interest to you: #13427

@@ -185,7 +185,7 @@ static void _on_sock_evt(sock_udp_t *sock, sock_async_flags_t type)
switch (coap_get_type(&pdu)) {
case COAP_TYPE_NON:
case COAP_TYPE_ACK:
if (&memo->resp_evt_tmout.queue) {
if (memo->resp_evt_tmout.queue) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm found this even on native, and is absolutely right. The queue member in the timeout struct already is an address.

The conditional is false when CONFIG_GCOAP_NON_TIMEOUT to 0 and sending a NON message type. I did test this configuration earlier, but didn't verify that the conditional was false. :-/

@@ -139,7 +139,7 @@ static void _on_sock_evt(sock_udp_t *sock, sock_async_flags_t type)
ssize_t res = sock_udp_recv(sock, _listen_buf, sizeof(_listen_buf),
0, &remote);
if (res <= 0) {
DEBUG("gcoap: udp recv failure: %d\n", res);
DEBUG("gcoap: udp recv failure: %d\n", (int)res);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is the usual problem with an ssize_t type on pic32. You can see a similar cast just below.

Previously, this DEBUG statement was within #if macro that was always false in the CI. In other words, it's a bug fix. :-)

@kb2ma
Copy link
Member Author

kb2ma commented Feb 21, 2020

Addressed Murdock issues. @miri64, please confirm you're OK with these changes.

@miri64
Copy link
Member

miri64 commented Feb 21, 2020

Yepp. Just make sure you keep them in a separate commit if unrelated to the main change when squashing (which: please do! ;-))

@miri64
Copy link
Member

miri64 commented Feb 21, 2020

There are also some vera++ warnings in the static tests you might want to address (you can ignore the line-length warnings for now, see #13400).

@kb2ma
Copy link
Member Author

kb2ma commented Feb 21, 2020

Just make sure you keep them in a separate commit if unrelated to the main change

The cast issue is a bug, so I'll make a new commit. The invalid address-of is just a faulty implementation in aba6873, which already is a bug fix commit in the PR, so I'll squash that in.

There are also some vera++ warnings

I saw those and was about to create a commit for them before I saw your message. But then I heard the little Martine voice in my head, and she said, "No, these changes are totally unrelated. Make another PR." I'll do that; should be quick and easy to review.

@miri64
Copy link
Member

miri64 commented Feb 21, 2020

I saw those and was about to create a commit for them before I saw your message. But then I heard the little Martine voice in my head, and she said, "No, these changes are totally unrelated. Make another PR." I'll do that; should be quick and easy to review.

Do I really come across as that strict? 🤔 😅

@kb2ma
Copy link
Member Author

kb2ma commented Feb 21, 2020

Do I really come across as that strict?

I'm sure she said it in a nice way. :-)

Fixes bug found during rework for sock_async implementation.
Fixes bug found during rework for sock_async implementation.
@kb2ma
Copy link
Member Author

kb2ma commented Feb 22, 2020

Squashed and added commit as discussed. I'll try this out with #13427 after the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants