Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for SyncIoBridge with examples and alternatives #6815

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Nathy-bajo
Copy link

@Nathy-bajo Nathy-bajo commented Sep 3, 2024

Motivation

The SyncIoBridge documentation lacked clarity on its usage, potentially leading to inefficient use and suboptimal performance. Users needed better guidance on when to use it and how to handle common use cases effectively.

Solution

This update enhances the SyncIoBridge documentation by:

•	Explaining its purpose and correct usage.
•	Providing practical examples and scenarios for effective use.
•	Highlighting alternatives to avoid blocking the async runtime.
•	Adding cross-references to related documentation.

These improvements aim to help users use SyncIoBridge more effectively and avoid common pitfalls.

Rendered

@Darksonn Darksonn added T-docs Topic: documentation A-tokio-util Area: The tokio-util crate M-io Module: tokio/io labels Sep 3, 2024
/// let mut file = File::open("output.txt")?;
/// std::io::copy(&mut sync_reader, &mut file)?;
/// Ok::<_, std::io::Error>(())
/// }).await??;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To please rustfmt in the CI I think you need to move the .await??; into its own line.

Suggested change
/// }).await??;
/// })
/// .await??;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen my mistake.
Thank you!

///
/// ```rust, no_run
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For async/await to work in the docs you actually also need a runtime since the doc test will be run by a synchronous main function by default.

See this from tokio as an example:

    /// ```rust,no_run
    /// use std::error::Error;
    ///
    /// #[tokio::main]
    /// async fn main() -> Result<(), Box<dyn Error>> {
    ///     let tokio_listener = tokio::net::TcpListener::bind("127.0.0.1:0").await?;
    ///     let std_listener = tokio_listener.into_std()?;
    ///     std_listener.set_nonblocking(false)?;
    ///     Ok(())
    /// }
    /// ```

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I think that most of the examples added here don't actually compile. Before we can merge this, we need to make them compile. I left some suggestions on how to do that.

///
/// ## Example 1: Hashing Data
///
/// When hashing data, using `SyncIoBridge` can lead to suboptimal performance and might not fully leverage the async capabilities of the system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this explained why using SyncIoBridge "can lead to suboptimal performance and might not fully leverage the async capabilities of the system"; I think just saying "you might have bad performance" is not as helpful as explaining the specific details, so that the user can know what issues they are avoiding and why they matter?

Comment on lines 20 to 74
/// ```rust, no_run
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
/// let hash = blake3::hash(&data);
/// ```
///
/// Or, for more complex cases:
///
/// ```rust, no_run
/// let mut data = vec![0; 64 * 1024];
/// loop {
/// let len = reader.read(&mut data).await?;
/// if len == 0 { break; }
/// hasher.update(&data[..len]);
/// }
/// let hash = hasher.finalize();
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a few comments in these examples explaining what they're doing...

/// ## Example 2: Compressing Data
///
/// When compressing data, avoid using `SyncIoBridge`` with non-async compression libraries, as it may lead to inefficient and blocking code.
/// Instead, use `async-compression`, which is designed to work with asynchronous data streams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to recommend a specific library in the tokio documentation, because it might not work for all use-cases. I might say

Suggested change
/// Instead, use `async-compression`, which is designed to work with asynchronous data streams.
/// Instead, use an async compression library, such as the `async-compression` crate, which is designed to work with asynchronous data streams.

Also, if we are going to recommend this crate specifically, we should probably link to its documentation.

Comment on lines 50 to 133
/// ## Example 3: Parsing `JSON`
///
/// When parsing `JSON` data, avoid using `SyncIoBridge` with `serde_json::from_reader` as it may cause blocking operations.
/// Instead, read the data into a `Vec<u8>` and parse it using `serde_json::from_slice`.
///
/// ```rust, no_run
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
/// let value: MyStruct = serde_json::from_slice(&data)?;
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really specific to JSON; this advice applies equally to any other serialization format. What about something like:

When parsing serialization formats such as JSON, avoid using SyncIoBridge with functions that deserialize data from a type implementing std::io::Read, such as serde_json::from_reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a number of the examples in this file won't actually compile. While it's fine to not run the examples using no_run, they should, at least, compile --- the test suite will fail if examples contain code that doesn't compile. In many cases, the examples reference names that aren't actually defined, such as &mut reader and &mut writer when there is no value named reader or writer in the example. We should add hidden definitions of these values to the examples to make sure they compile.

Comment on lines 21 to 22
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
/// use tokio::io::AsyncReadExt;
///
/// # let mut reader = tokio::io::empty();
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;

/// ```rust, no_run
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
/// let hash = blake3::hash(&data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will compile, because there is no blake3 in scope. We can fix this by adding a hidden dummy version, like:

Suggested change
/// let hash = blake3::hash(&data);
/// let hash = blake3::hash(&data);
/// # mod blake3 { pub fn hash(_: &[u8]) {} }

Comment on lines 28 to 35
/// ```rust, no_run
/// let mut data = vec![0; 64 * 1024];
/// loop {
/// let len = reader.read(&mut data).await?;
/// if len == 0 { break; }
/// hasher.update(&data[..len]);
/// }
/// let hash = hasher.finalize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't compile because hasher and reader are not defined in this scope.

Comment on lines 44 to 47
/// use async_compression::tokio::write::GzipEncoder;
///
/// let mut encoder = GzipEncoder::new(writer);
/// tokio::io::copy(&mut reader, &mut encoder).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't compile, because async_compression is not a dependency, and reader and writer are not defined.

Comment on lines 56 to 58
/// let mut data = Vec::new();
/// reader.read_to_end(&mut data).await?;
/// let value: MyStruct = serde_json::from_slice(&data)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't compile because reader is undefined, and serde_json is not a dependency of the example.

@Nathy-bajo
Copy link
Author

Nathy-bajo commented Sep 4, 2024

Hi @hawkw.
Thank you for the suggestions! I have updated the codes.

@Nathy-bajo
Copy link
Author

Is there anything else I need to add? @hawkw @Darksonn @tglane

Comment on lines 17 to 25
/// When hashing data, using `SyncIoBridge` can lead to suboptimal performance and might not fully leverage the async capabilities of the system.
///
/// ### Why It Matters:
/// `SyncIoBridge` allows you to use synchronous I/O operations in an asynchronous context by blocking the current thread. However, this can be inefficient because:
/// Blocking: The use of `SyncIoBridge` may block a valuable async runtime thread, which could otherwise be used to handle more tasks concurrently.
/// Thread Pool Saturation: If many threads are blocked using `SyncIoBridge`, it can exhaust the async runtime's thread pool, leading to increased latency and reduced throughput.
/// Lack of Parallelism: By blocking on synchronous operations, you may miss out on the benefits of running tasks concurrently, especially in I/O-bound operations where async tasks could be interleaved.
///
/// Instead, consider reading the data into memory and then hashing it, or processing the data in chunks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are several opportunities to improve the wording here.

  1. The explanation for why you should try to avoid SyncIoBridge should go before ## Example 1 and after # Alternatives. It applies to all examples, not just the first one.
  2. Each example should start by mentioning that the use of SyncIoBridge is unnecessary.
  3. For each example, there should be a short explanation of what the example does. See below for explanations for the hashing example. Just saying "or for more complex cases" is very generic, and doesn't actually explain when a case is complex.
  4. Your text should be formatted according to markdown. You don't put newlines at the end of sentences like that. Instead, wrap the line after 80 characters.
/// ## Hashing data
///
/// It is common to use `SyncIoBridge` for hashing data, but this is unnecessary
/// in most cases.
///
/// There are two strategies for avoiding `SyncIoBridge` when hashing data. When
/// the data fits into memory, the easiest is to read the data into a `Vec<u8>`
/// and hash it:
///
/// [ first example ]
///
/// When the data doesn't fit into memory, the hashing library will usually
/// provide a hasher that you can repeatedly call `update` on to hash the data
/// one chunk at the time. For example:
///
/// [second example]

Comment on lines 53 to 55
/// #[tokio::main]
/// async fn main() -> Result<(), Box<dyn std::error::Error>> {
/// let mut reader = tokio::io::empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using tokio::io::empty(), I think it makes more sense to define a function that takes any AsyncRead.

async fn hash_contents(reader: impl AsyncRead) -> std::io::Result<()> {

Also, please avoid using Box<dyn Error> as that type is not Send. Instead, you can use std::io::Error for your examples.

@Nathy-bajo
Copy link
Author

Hi @Darksonn
I have updated the PR. Is there anything else needed?

@Darksonn Darksonn requested a review from hawkw September 16, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio-util Area: The tokio-util crate M-io Module: tokio/io T-docs Topic: documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants