add a basic illumos test job #6769

hawkw · 2024-08-12T16:28:57Z

This commit adds a simple test job for illumos using Buildomat. I'd like to make some more improvements and add docs before merging this, but I want to see if it even works first.

hawkw · 2024-08-12T16:30:05Z

@jclulow, whenever you have the time, I'd love to get your eyes on my buildomat script --- I'm sure it could be made nicer!

hawkw · 2024-08-12T18:00:29Z

Welp, it looks like some of the tests fail on illumos, which I actually think is good news --- that's why we're adding this, after all!

jclulow · 2024-08-12T18:04:20Z

I looked at the anon_pipe_spawn_echo test, FWIW, and I think it's just leaning on unspecified behaviour of echo(1). You could switch echo out for printf there, and I think it would still work on every other system.

(For reference: echo in xpg8 says that Implementations shall not support any options.)

hawkw · 2024-08-12T18:30:39Z

I looked at the anon_pipe_spawn_echo test, FWIW, and I think it's just leaning on unspecified behaviour of echo(1). You could switch echo out for printf there, and I think it would still work on every other system.

Yeah, I kinda figured it was something like that --- makes sense to me!

The illumos version of `echo(1)` doesn't support `-n`, so it interprets it literally as part of the string to echo, making the test fail. I've changed it to use `printf(1)` with no flags args to get the same behavior. I also changed the name of the test, since it doesn't actually spawn `echo(1)` anymore :)

hawkw · 2024-08-12T19:26:42Z

Some more test failures, including two for UDS:

It looks like the test that reading from a UDS socket doesn't return WouldBlock ( tokio::uds_datagram::try_send_recv_never_block)...returns WouldBlock on illumos: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S2093 That seems like it's probably a bug, but needs further investigation.
The tokio_uds::uds_stream::epollhup test in UDS fails because it's expecting io::Error::ConnectionRefused but seeing io::Error::ConnectionReset: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S2093 That's probably just an instance of the OS returning a slightly different errno and needs a cfg(target_os="illumos") to expect the correct one?
It looks like the poll_fns and reset_writable tests in tokio::io_async_fd have been running for Way Longer Than They Should Be, kinda indicating that they're hung: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S2277 That probably needs further investigation as well.

I'll add that we appear to currently be on mio 1.0.2 on CI: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S336 It would be interesting to see whether the test results are different on earlier mio revisions.

hawkw · 2024-08-12T20:09:12Z

...yup, they're way stuck: https://buildomat.eng.oxide.computer/wg/0/details/01J53XZE95B7J0BRPT2QN3FR1B/3o4DSQRTlGfizqZJ0Q8Qzv5Lp8Axg7hUe5MPyG5OUVR87hae/01J53Y0AQTN8H3RDD6CK3QJ6FG#S2348

hawkw · 2024-08-12T21:05:07Z

well, that's cool: if i add a single dbg! macro to the AsyncFd::read call in the drain function in io_async_fd.rs, like this:

diff --git a/tokio/tests/io_async_fd.rs b/tokio/tests/io_async_fd.rs
index 3ab1cebd..c5d5cb61 100644
--- a/tokio/tests/io_async_fd.rs
+++ b/tokio/tests/io_async_fd.rs
@@ -139,7 +139,7 @@ fn drain(mut fd: &FileDescriptor) {
     let mut buf = [0u8; 512];
     #[allow(clippy::unused_io_amount)]
     loop {
-        match fd.read(&mut buf[..]) {
+        match dbg!(fd.read(&mut buf[..])) {
             Err(e) if e.kind() == ErrorKind::WouldBlock => break,
             Ok(0) => panic!("unexpected EOF"),
             Err(e) => panic!("unexpected error: {:?}", e),

the test now passes for me on illumos. 🤡

hawkw · 2024-08-12T21:20:19Z

stack traces from the reset_writable test when it hangs:

eliza@atrium ~ $ pargs 9451
9451:   /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset_writable
argv[0]: /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8
argv[1]: reset_writable
argv[2]: --nocapture
eliza@atrium ~ $ pstack 9451 | demangle
9451:   /home/eliza/tokio/target/debug/deps/io_async_fd-c80359565a3a95c8 reset
--------------------- thread# 1 / lwp# 1 ---------------------
 fffff9ffeef44697 lwp_park (0, 0, 0)
 fffff9ffeef3db05 cond_wait_queue (e8ef88, e8efa0, 0) + 55
 fffff9ffeef3e16a __cond_wait (e8ef88, e8efa0) + ba
 fffff9ffeef3e1ae cond_wait (e8ef88, e8efa0) + 2e
 fffff9ffeef3e1f5 pthread_cond_wait (e8ef88, e8efa0) + 15
 0000000000a0814b std::thread::park::hb49d084e872705a5 () + 6b
 000000000079b87e std::sync::mpmc::list::Channel<T>::recv::{{closure}}::hf62e1229f5251f1e () + 8e
 000000000079b382 std::sync::mpmc::list::Channel<T>::recv::h0b15a8b7d971ec93 () + 272
 00000000007b05c0 test::console::run_tests_console::hbc01c9336ae30ea6 () + 1c00
 00000000007cea23 test::test_main::h821d08de9f858998 () + 153
 00000000007cf817 test::test_main_static::hd54d88e3cf121884 () + 57
 0000000000797e55 io_async_fd::main::hda858a55aa0b0e04 () + 15
 00000000007652ae core::ops::function::FnOnce::call_once::h537173c55fa15c15 () + e
 000000000078ed51 std::sys_common::backtrace::__rust_begin_short_backtrace::h5368d7f525ae1a75 () + 11
 000000000073b5e4 std::rt::lang_start::{{closure}}::hed5ee56bf0bdaa4e () + 14
 0000000000a077c3 std::rt::lang_start_internal::hdc1ac086c909d074 () + 2d3
 000000000073b5b7 std::rt::lang_start::h20aead5d623943e6 () + 37
 0000000000797e81 main () + 21
 000000000072ec37 _start_crt () + 87
 000000000072eb98 _start () + 18
------------ thread# 2 / lwp# 2 [reset_writable] -------------
 fffff9ffeef4af4a ioctl    (3, d001, fffff9ffeb5fddc0)
 000000000092ab5f mio::sys::unix::selector::Selector::select::heba8144752651684 () + 8f
 000000000092e3a9 mio::poll::Poll::poll::ha7f583c7aba8c8df () + 49
 00000000007f99d3 tokio::runtime::io::driver::Driver::turn::hdbc877f2c5a33d2a () + 73
 00000000007f96d9 tokio::runtime::io::driver::Driver::park::h9aa1ba931c7ade7a () + 49
 000000000081e1e9 tokio::runtime::signal::Driver::park::hc0c910bc43cc300c () + 19
 00000000007f3d89 tokio::runtime::process::Driver::park::hc7bde4d7ee237dd9 () + 19
 000000000086a6b7 tokio::runtime::driver::IoStack::park::hc2b6b08bb1a7b51d () + 47
 00000000008b1658 tokio::runtime::time::Driver::park_internal::h012bb0183fb6e65b () + 2e8
 00000000008b12a3 tokio::runtime::time::Driver::park::h540665ec9ecaed80 () + 23
 000000000086ac33 tokio::runtime::driver::TimeDriver::park::h092df9172d1963aa () + 33
 0000000000869f15 tokio::runtime::driver::Driver::park::h82219332c43dd971 () + 15
 0000000000802e52 tokio::runtime::scheduler::current_thread::Context::park::{{closure}}::h591fef1df8327f80 () + 22
 0000000000803507 tokio::runtime::scheduler::current_thread::Context::enter::hd4aadae4f3c48ecb () + 147
 0000000000814590 tokio::runtime::scheduler::current_thread::Context::park::h5710d1b3bfecfdae () + 220
 0000000000755e9c tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::hcd0aab2bdfea8763 () + 61c
 0000000000754e0b tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::ha9e5ceabc5a33a08 () + 2b
 0000000000790256 tokio::runtime::context::scoped::Scoped<T>::set::hba54509bcb7d1159 () + 96
 0000000000792ad7 tokio::runtime::context::set_scheduler::{{closure}}::h0d748329321238e3 () + 37
 000000000073fb2e std::thread::local::LocalKey<T>::try_with::ha15aa11176beca59 () + fe
 000000000073ddf1 std::thread::local::LocalKey<T>::with::h3c6198f3a4fe4c05 () + 11
 00000000007929af tokio::runtime::context::set_scheduler::h4d7018abd6870714 () + 2f
 0000000000754529 tokio::runtime::scheduler::current_thread::CoreGuard::enter::h3d583cd250e1a64c () + 179
 0000000000754ebd tokio::runtime::scheduler::current_thread::CoreGuard::block_on::h4b3b82ad10cb0d96 () + 1d
 000000000075069f tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::h785603f5fbb451e3 () + 1ff
 000000000078b9c2 tokio::runtime::context::runtime::enter_runtime::h7ace6ddcba5e5651 () + f2
 0000000000750067 tokio::runtime::scheduler::current_thread::CurrentThread::block_on::hbfbcbe776dcb53ff () + 97
 000000000073802f tokio::runtime::runtime::Runtime::block_on_inner::h1afb29bef3233be6 () + 8f
 000000000073849a tokio::runtime::runtime::Runtime::block_on::hac4960ca03c4ed0a () + 5a
 0000000000794dd3 io_async_fd::reset_writable::h070bf37d42223f46 () + 123
 00000000007730a9 io_async_fd::reset_writable::{{closure}}::hb0f0c038f01573a4 () + 19
 0000000000765a98 core::ops::function::FnOnce::call_once::hc1404f0ca2253420 () + 18
 00000000007d5462 test::__rust_begin_short_backtrace::h1d84a84fc5339953 () + 12
 00000000007d4330 test::run_test::{{closure}}::hc7f6d1d4a8ac2dde () + 5b0
 0000000000798a22 std::sys_common::backtrace::__rust_begin_short_backtrace::hbc4ec67e58aa30af () + c2
 000000000079dbc7 core::ops::function::FnOnce::call_once{{vtable.shim}}::hb8da26b9e73c729a () + 77
 0000000000a2cec9 std::sys::pal::unix::thread::Thread::new::thread_start::h9ad099a424212e51 () + 29
 fffff9ffeef44307 _thrp_setup (fffff9ffec6a0240) + 77
 fffff9ffeef44650 _lwp_start ()
eliza@atrium ~ $

unfortunately, these stacks are not very interesting: they're just telling us that we are waiting on a notification (in mio::sys::unix::selector::Selector::select that we never get. which, given the failure mode, i feel like i could've told you...

@jclulow

## Motivation The `io_async_fd.rs` tests contain a `drain()` function, which currently performs synchronous reads from a UDS socket until it returns `io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The *intent* behind this function is to ensure that all data has been drained from the UDS socket's buffer...which is what it appears to do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or `EAGAIN` may be returned before enough data has been read from the UDS socket to result in the other end being notified that the socket is now writable. In particular, this appears to be the case on illumos, where the tests using this function hang forever (see [this comment][1] on PR #6769). To my knowledge, this behavior is still POSIX-compliant --- the reader will still be notified that the socket is readable, and if it were actually doing non-blocking IO, it would continue reading upon receipt of that notification. So, relying on `EWOULDBLOCK` to indicate that the socket has been sufficiently drained appears to rely on Linux/FreeBSD behavior that isn't necessarily portable to other Unices. ## Solution This commit changes the `drain()` function to take an argument for the number of bytes *written* to the socket previously, and continue looping until it has read that many bytes, regardless of whether `EWOULDBLOCK` is returned. This should ensure that the socket is drained on all POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable` and `io_async_fd::poll_fns` tests no longer hang forever on illumos. I think making this change is an appropriate solution to the test failure here, as the `drain()` function is part of the test, rather than the code in Tokio *being* tested, and (as I mentioned above) the use of blocking reads on a non-blocking socket without a mechanism to continue reading when the socket becomes readable again is not really something a real life program seems likely to do. Ensuring that all the written bytes have been read by passing in a byte count seems more faithful to what the test is actually *trying* to do here, anyway. Thanks to @jclulow for debugging what was going on here! This change was cherry-picked from commit f18d6ed from PR #6769, so that the fix can be merged separately. [1]: #6769 (comment)

hawkw · 2024-08-14T19:17:03Z

#6776 should fix the io_async_fd tests hanging.

hawkw · 2024-08-14T19:17:28Z

https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1953 seems not great --- it appears that on illumos, the thread local...isn't, somehow?

## Motivation In `tokio::task::local`, there's a `LocalState::assert_called_from_owner_thread` function, which checks that the caller's thread ID matches that of the thread on which the `LocalSet` was created. This assertion is disabled on FreeBSD and OpenBSD, with a comment indicating that this is because those systems have "some weirdness around thread-local destruction". If memory serves, I believe the "weirdness" was related to the order in which thread-locals are destroyed relative to each other, or something along those lines. Upon running the tests on illumos, there appears to be [a similar panic][1] in the `task_local_set_in_thread_local` due to this assertion assertion while dropping a `LocalSet` which lives in a thread-local. This leads me to believe that illumos, and presumably Solaris, also exhibit the same "weirdness". This wouldn't be too surprising, given the shared heritage of those systems. ## Solution This commit adds `target_os="illumos"` and `target_os="solaris"` to the `cfg` attribute that disables the assertion on systems determined to have weirdness. This fixes the test panicking on illumos. In the future, it might be worthwhile to look into changign the assertion to only be disabled on weirdness-having systems _while dropping the `LocalSet`_, rather than all the time. That way, we can still check other accesses. But, I wanted to make the minimum change necessary to fix it here before messing around with that. [1]: https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1954

@jclulow

* tests: handle spurious EWOULDBLOCK in io_async_fd ## Motivation The `io_async_fd.rs` tests contain a `drain()` function, which currently performs synchronous reads from a UDS socket until it returns `io::ErrorKind::WouldBlock` (i.e., errno `EWOULDBLOCK`/`EAGAIN`). The *intent* behind this function is to ensure that all data has been drained from the UDS socket's buffer...which is what it appears to do...on Linux. On other systems, it appears that an `EWOULDBLOCK` or `EAGAIN` may be returned before enough data has been read from the UDS socket to result in the other end being notified that the socket is now writable. In particular, this appears to be the case on illumos, where the tests using this function hang forever (see [this comment][1] on PR #6769). To my knowledge, this behavior is still POSIX-compliant --- the reader will still be notified that the socket is readable, and if it were actually doing non-blocking IO, it would continue reading upon receipt of that notification. So, relying on `EWOULDBLOCK` to indicate that the socket has been sufficiently drained appears to rely on Linux/FreeBSD behavior that isn't necessarily portable to other Unices. ## Solution This commit changes the `drain()` function to take an argument for the number of bytes *written* to the socket previously, and continue looping until it has read that many bytes, regardless of whether `EWOULDBLOCK` is returned. This should ensure that the socket is drained on all POSIX-compliant systems, and indeed, the `io_async_fd::reset_writable` and `io_async_fd::poll_fns` tests no longer hang forever on illumos. I think making this change is an appropriate solution to the test failure here, as the `drain()` function is part of the test, rather than the code in Tokio *being* tested, and (as I mentioned above) the use of blocking reads on a non-blocking socket without a mechanism to continue reading when the socket becomes readable again is not really something a real life program seems likely to do. Ensuring that all the written bytes have been read by passing in a byte count seems more faithful to what the test is actually *trying* to do here, anyway. Thanks to @jclulow for debugging what was going on here! This change was cherry-picked from commit f18d6ed from PR #6769, so that the fix can be merged separately. [1]: #6769 (comment) Signed-off-by: Eliza Weisman <[email protected]>

## Motivation Currently, the test `uds_stream::epollhup` expects that a `UdsStream::connect` future to a Unix socket which is closed by the accept side to always fail with `io::ErrorKind::ConnectionReset`. On illumos, and potentially other systems, it instead fails with `io::ErrorKind::ConnectionRefused`. This was discovered whilst adding an illumos CI job in PR #6769. See: #6769 (comment) ## Solution This commit changes the test to accept either `ConenctionReset` or `ConnectionRefused`. This way, we are more tolerant of different operating systems which may decide to return slightly different errnos here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this situation, although arguably, ECONNREFUSED is actually more correct: the acceptor did not accept the connection at all, which seems like "refusing" it to me... This commit was cherry-picked from PR #6769.

## Motivation Currently, the test `uds_stream::epollhup` expects that a `UdsStream::connect` future to a Unix socket which is closed by the accept side to always fail with `io::ErrorKind::ConnectionReset`. On illumos, and potentially other systems, it instead fails with `io::ErrorKind::ConnectionRefused`. This was discovered whilst adding an illumos CI job in PR #6769. See: #6769 (comment) ## Solution This commit changes the test to accept either `ConenctionReset` or `ConnectionRefused`. This way, we are more tolerant of different operating systems which may decide to return slightly different errnos here. Both ECONNREFUSED and ECONNRESET seem reasonable to expect in this situation, although arguably, ECONNREFUSED is actually more correct: the acceptor did not accept the connection at all, which seems like "refusing" it to me...

ci: add a basic illumos test job

b8552c6

This commit adds a simple test job for illumos using Buildomat. I'd like to make some more improvements and add docs before merging this, but I want to see if it even works first.

Darksonn added the A-ci Area: The continuous integration setup label Aug 12, 2024

hawkw added 2 commits August 12, 2024 13:22

fix uds_stream::epollhup test

1064b29

ci: disable nextest failfast

ed03ae9

hawkw added 2 commits August 14, 2024 11:34

make errno assertion less flaky

35c1465

make io_async_fd tests always drain written amount

f18d6ed

hawkw mentioned this pull request Aug 14, 2024

tests: handle spurious EWOULDBLOCK in io_async_fd #6776

Merged

hawkw mentioned this pull request Aug 14, 2024

task: illumos/Solaris have thread-local weirdness #6777

Open

hawkw mentioned this pull request Aug 15, 2024

tests: handle ECONNREFUSED in uds_stream::epollhup #6778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a basic illumos test job #6769

add a basic illumos test job #6769

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

jclulow commented Aug 12, 2024 •

edited

Loading

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 14, 2024

hawkw commented Aug 14, 2024

add a basic illumos test job #6769

Are you sure you want to change the base?

add a basic illumos test job #6769

Conversation

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

jclulow commented Aug 12, 2024 • edited Loading

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 12, 2024

hawkw commented Aug 14, 2024

hawkw commented Aug 14, 2024

jclulow commented Aug 12, 2024 •

edited

Loading