Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make peer receive routine process messages asynchronously #128

Closed
4 tasks
egonspace opened this issue Oct 20, 2020 · 0 comments · Fixed by #135
Closed
4 tasks

Make peer receive routine process messages asynchronously #128

egonspace opened this issue Oct 20, 2020 · 0 comments · Fixed by #135
Assignees
Labels
C: enhancement Classification: New feature or its request, or improvement in maintainability of code

Comments

@egonspace
Copy link
Contributor

egonspace commented Oct 20, 2020

Summary

All messages from a peer are processed by one receiver thread. These include consensus messages, mempool tx messages, evidence messages and blockchain messages. Among them, it is especially important to receive proposal messages(a kind of consensus messages) quickly, but if it takes a long time to process other messages in the front, it is possible to receive proposal messages late. Messages that are not dependent on each other need to be processed in individual threads asynchronously.

Problem Definition

In performance tests of 100 validators, round failures(progressing to next round after a consensus failure) were cited as the first bottleneck of performance. One of the reasons for the round failure is that some validator received a proposal too late. If a validator is late to enter the new round, he may be late to receive the proposal, but it has been found that he is late to receive the proposal even though he entered the new round early. In particular, it was witnessed that the validator did not receive the proposal immediately after a peer had sent the proposal, and received it several seconds later.

I investigated for several days the cause of the delay of several seconds between sending and receiving this proposal and found that the cause was due to the way the message receive routine works.

The receive routine is defined in MConnection.recvRoutine(). All messages from a peer are processed in this infinite loop.
Each message has a channel ID and is assigned and processed to the corresponding reactor according to that channel ID, all of which operate in one thread.

For example, if there are hundreds of tx messages in the receive buffer and then there are a proposal message after that, then the tx messages in the front must be processed to read the proposal. The problem is that the mempool reactor holds the lock to handle tx messages, where it can wait for hundreds of milliseconds to lock, so it may take a long time to process the tx messages.

In order to improve this problem, I suggested that each reactor process messages in a separate thread asynchronously.
To do this, the receiver thread only reads the message and puts the message into the channel each reactor has, and each reactor has the receive routine, so it reads the channel to process the message.

This will require four more threads per peer, but I don't think it'll be a big problem because it will be resting when messages aren't frequent.

Proposal

Create the receive routine thread for each reactor to process messages asynchronously.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@egonspace egonspace self-assigned this Oct 29, 2020
@egonspace egonspace added the C: enhancement Classification: New feature or its request, or improvement in maintainability of code label Oct 29, 2020
@egonspace egonspace linked a pull request Nov 2, 2020 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: enhancement Classification: New feature or its request, or improvement in maintainability of code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant