feat(core): amortize many ready messages into fewer, larger buffers #1423
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce an EncodedBytes combinator which encodes multiple ready messages - up to a yield threshold - before yielding the next bytes buffer.
Or, if the message stream polls to pending, yield the available bytes immediately.
These ammortized buffers exhibit far better throughput when streaming a high rate of small messages, because hyper and h2 avoid copying the yielded buffer and dispatch each as a separate, non-vectorized tcp send.
Motivation
We have observed that tonic exhibits poor CPU performance when sending a high data-rate of small messages. Each message is dispatched as a single
tcp_sendmsg
, and each incurs overhead for routing and other kernel activities in the network path. It's incumbent to increase the size of network sends without adding latency.Solution
Introduce a combinator which repeatedly polls its delegate stream for ready messages, extending a current buffer (up to a yield threshold). This approach avoids any new allocation and adds just one "magic" constant: a yield threshold after which the combinator will immediately yield the next chunk of bytes.