-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNTP fails to close the used socket #34165
Comments
@pfalcon can you take a look at this as this seems to be related to the Posix API issues we have been seeing occacionally. |
That doesn't sounds too reassuring. To fix the issue, we need to see the issue. So, please try to provide steps to reproduce (ideally, reproduction code + build files).
Then the likely reason is that SNTP should depend on CONFIG_NET_SOCKETS_POSIX_NAMES=y (or POSIX_API=y). I can prepare such a patch.
The latest SDK release is 0.12.4. 0.11.3 is 4-5 releases old. Please upgrade and retry.
As best as I remember, SNTP is a completely "userspace" lib, and so intended to work with native POSIX API, not adhoc Zephyr wrappers (for example, there could be counter-proposal to remove zsock_* prefixes, though I don't think anybody came up with that yet). So, I personally can't recommend such a workaround. (But maybe somebody else will.)
Yes, POSIX API is experimental and needs more work. For that, we need tangible examples of problem cases, and seek to find solutions, not workarounds. |
An update:
I still don't understand why depending on POSIX api is desireable. It just makes tracking down things a lot harder. The API is not properly namespaced, which means it's prone to issues like these, and it pollutes the namespace everywhere It could be argued that if all other functions are properly namespaced, this wouldn't be an issue, but unfortunately there's a lot of weird code out there, especially in the embedded world. Is the |
No, it is a public API. The BSD socket API calls the |
@johann2 : Good timing. I also was looking into this stuff and posted #34244.
So, the question is why do you set that? Per the current state of affairs (not how it was 5 or so years ago), if you use networking in an application (== sockets), either CONFIG_POSIX_API or CONFIG_NET_SOCKETS_POSIX_NAMES should be set. Or, you're doing something very peculiar. What?
Thanks. It at least ensures that we're on the same line, and there's less search space to wander thru. (And some time ago there were changes in the toolchain regarding handling stuff like close()).
Well, I don't think that phrase conveys the full picture of it. BSD Socket API is a part of POSIX API, so that's how it gets into picture. But we exactly don't want to force use of the full POSIX subsys just to use sockets, that's why we have CONFIG_NET_SOCKETS_POSIX_NAMES. Finally, when Sockets API was initially developed ~5 years ago as an experimental project, there was also a requirement to namespace it, mostly from concerns of integrating with vendor network offloading engines, which themselves sometimes provide socket-like API. This last usecase is the least (if at all) used, and shouldn't be of concern for most users. #34244 tries to address that by making sure that either CONFIG_POSIX_API or CONFIG_NET_SOCKETS_POSIX_NAMES is enabled, and only users which have peculiar needs take explicit action (disabling it).
Which API exactly is not properly namespaced? BSD Socket API is the cross-OS de jure and de facto standard for networking connectivity. In that regard its more reasonable expect that SNTP (network protocol) depends on standard cross-platform networking API, than on adhoc Zephyr-specific API.
Yes, that's why using zsock_ prefixes was among the requirements for the initial implementation. Fast forward a few years, and it seems more like overly conservative, reinsuring constraint, which causes more confusion and maintenance work than real boon.
It's an implementation detail of Zephyr RTOS, otherwise being on par with the standard cross-platform API. That's why I'm trying to understand why people aren't using the standard API (and even propose to switch standard API to Zephyr-specific in existing code). |
Zephyr socket subsystem has come a long way since initial experimental alternative to internal Zephyr networking API. Its configuration also mirrors the usual conservative approach, where a user needs to explicitly enable options to get "more" features. And as an experimental API, socket subsystem was initially developed as namespaced API, where all functions/structures are prefixed with "zsock_", and to get standard names, CONFIG_NET_SOCKETS_POSIX_NAMES needs to be set (or alternatively, CONFIG_POSIX_API needs to be, which enabled full POSIX subsys overall). However, a few years later, sockets are the standard networking API, and in majority of cases its used under the standard POSIX names. Necessity to explicitly set an option to achieve this effects, and confusion which results from it - are just unneeded chores for users. So, switch CONFIG_NET_SOCKETS_POSIX_NAMES to be on by default (unless CONFIG_POSIX_API is already defined). It still can be explicitly disabled if needed (but usecases for that would be peculiar and rare). Addresses zephyrproject-rtos#34165 Signed-off-by: Paul Sokolovsky <[email protected]>
In our main codebase this option was implicitly set to n as I wasn't aware it's required for SNTP. I only set it to n to reproduce this using the
The In any case, I don't feel strongly enough about this question to push it further. There's a lot of ways to fix this bug and whatever you decide to do would work for me. |
Zephyr socket subsystem has come a long way since initial experimental alternative to internal Zephyr networking API. Its configuration also mirrors the usual conservative approach, where a user needs to explicitly enable options to get "more" features. And as an experimental API, socket subsystem was initially developed as namespaced API, where all functions/structures are prefixed with "zsock_", and to get standard names, CONFIG_NET_SOCKETS_POSIX_NAMES needs to be set (or alternatively, CONFIG_POSIX_API needs to be, which enabled full POSIX subsys overall). However, a few years later, sockets are the standard networking API, and in majority of cases its used under the standard POSIX names. Necessity to explicitly set an option to achieve this effects, and confusion which results from it - are just unneeded chores for users. So, switch CONFIG_NET_SOCKETS_POSIX_NAMES to be on by default (unless CONFIG_POSIX_API is already defined). It still can be explicitly disabled if needed (but usecases for that would be peculiar and rare). Addresses #34165 Signed-off-by: Paul Sokolovsky <[email protected]>
This library is coded with standard POSIX names for socket functions, so make that requirement explicit. Also, switch it from select'ing NET_SOCKETS, to depend'ing on it. This follows the general approach of avoiding unneeded select's in Zephyr, which lead to conflicting dependencies and make debugging dependencies complex overall. In this particular case, it's fair (for a user) to expect that "simple network time protocol" requires networking API, namely sockets, and have that explicitly on in their app configuration, giving better overview of their app config overall. Fixes: zephyrproject-rtos#34165 Signed-off-by: Paul Sokolovsky <[email protected]>
That was my hunch, thanks for confirming it. That's why I went with #34244 , because that's the actual expectation how it should be for last few years.
Yes, "zsock_" is "better" as a thing-in-its-own-small-box. That's why I went the "zsock_" prefix in the initial prototype many years ago. But the goal was different - to support the standard BSD Sockets API. Using "zsock_" was just an interim implementation strategy I chose. It grew a lot since then, and we actually got more options to support on the "real BSD Sockets API" side, like full POSIX subsystem vs just sockets, protected kernel with syscalls, etc. Overall usage of "real BSD Sockets API" vs "adhoc shim" is 90% vs 10% (more likely 99% vs 1%). So, "zsock_" overall has become more of a nuisance, and as I mentioned, I won't be surprised if someone proposed to remove it (that won't be me, given all the effort I put into maintaining it over the years).
It's not "low-level unix C programming", it's Portable Operating Systems Interface standard (abbreviated POSIX), the only open cross-industry OS API standard. The aim of Zephyr was to improve the situation with adhoc and proprietary APIs in the deeply embedded space (disclaimer: to the best of my knowledge, I don't set aims, and they change over time). Hence, standard API, known to majority of people. And for minority not familiar with it, it's again more beneficial to learn the standard API than yet another adhoc one.
That's too generic question to even bring here IMHO. There was other RTOSes started in about the same time as Zephyr, let's call them colloquially "Zephyr competitors". What do you think - did they mean their code to be ported to other OSes? Parts of their code is now in Zephyr (and in other projects). So, generally, any open source project's code is meant for reuse in other projects. And it's definitely not "the code is not meant to be ported to other OSes and to emphasize that better, let's use proprietary API there" - unless someone explicitly sets that as the goal ;-).
I tried to reply to your comments and explain the paradigmatic approach I follow. From my PoV, adding "zsock_" doesn't improve situation, but just preserves the (messy) situation and pushes it out of Zephyr tree into users' codebases (where they're expected to take existing socket code and it to just work). That's why I'm keen to keep specimen(s) of standard BSD Socket API usage in the tree, and look for ways to actually improve the situation around that usage. That's not easy, given the breadth and multitude of configurations Zephyr supports, but pretending that the problem doesn't exist (by retracting into the "zsock_" shell) is not an option for Zephyr developers. |
This library is coded with standard POSIX names for socket functions, so make that requirement explicit. Also, switch it from select'ing NET_SOCKETS, to depend'ing on it. This follows the general approach of avoiding unneeded select's in Zephyr, which lead to conflicting dependencies and make debugging dependencies complex overall. In this particular case, it's fair (for a user) to expect that "simple network time protocol" requires networking API, namely sockets, and have that explicitly on in their app configuration, giving better overview of their app config overall. Fixes: #34165 Signed-off-by: Paul Sokolovsky <[email protected]>
Describe the bug
Seems that the close function used in
sntp_close
will resolve to someclose
function not related to sockets instead of the one that callszsock_close
. As a result the socket remains open.To Reproduce
I haven't been able to put together a minimal example to reproduce this.
The issue seems to happen only when
CONFIG_NET_SOCKETS_POSIX_NAMES=n
Expected behaviour
sntp_close
should close the socketImpact
This is a blocking issue for us, because once enough sockets have been left open, no new sockets can be opened and the system loses connectivity.
Logs and console output
N/A
Environment (please complete the following information):
Additional context
It seems to me that using the
zsock_*
functions as proposed in #24129 would be the easiest fix and would bypass the problem completely. I'd be happy to bring this PR up to date if needed.However, this bug may be a symptom of something else wrong in the POSIX api. I hope someone who's more familiar with this part of the codebase could take a look. #31323 seems to be somehow related to this bug as well.
The text was updated successfully, but these errors were encountered: