Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Mempool performance #226

Merged
merged 103 commits into from
Oct 6, 2023
Merged

Conversation

tiram88
Copy link
Collaborator

@tiram88 tiram88 commented Jul 18, 2023

This PR addresses the issue of the mempool being a bottleneck in the first testnet-11 experiment @ 10 BPS.

The previous monolithical design did lock both the Mempool and to some extent the virtual processor during every call to its public functions, through the Manager that owns it, leading to some long delays. The new general design is to split the functions into simpler atomic steps logically protected by a lock on the mempool when the latter is implied. This approach has following consequences:

  • The Manager owning the Mempool instance no longer is a simple pass-through locking and exposing mempool functions publicly. An important part of the mempool logic gets moved into the manager.
  • Some verification steps must sometimes be duplicated, particularly checking for double spends, since executing a Manager function is no longer atomic.
  • Some processes are reorganized and some behaviors do change, taking advantage of the potential this new design offers.

Another performance improvements are brought:

  • Processing batches of transaction in a topological order, forming sub-batches by levels of chained dependency
  • Parallel validation of sub-batches of transactions into the virtual processor, processing those in chunks formed up to a maximal mass so the virtual processor does lock for too long,
  • A more efficient algorithm for making room in the mempool when it is full and some transaction must be added. This process also has a modified behavior: the transaction with the lowest fee rate is removed, but (this is new) only among transactions having no chained dependency and not being a parent of the transaction getting added.

Validate and insert a single transaction

In order to bring a much finer lock granularity the function of validating and inserting a transaction into the mempool is split in 4 steps:

  1. pre-validation (read lock on Mempool)
  2. validation by the virtual processor (no lock on Mempool)
  3. post-validation and insertion (write lock on Mempool)
  4. validation and insertion of unorphaned transactions (see below)

Validate and insert unorphaned transactions

Following any insert of a transaction into the mempool, some orphan transactions may be unorphaned. Here again we take advantage of the new design. On insertions, room is made on the fly if necessary and then we keep looping as long as new transactions are unorphaned (both new behaviors).

  1. parallel validation of batches of transactions (no lock)
  2. post-validation and insertion (write lock)
  3. in case of new unorphaned transactions, loop to step 1 (new behavior)

Validate and insert transactions in batch

This function is involved in the transaction relay flow, where a bunch of transactions gets broadcasted to peers. The batch is processed in a topological order by level of chained dependency (new behavior). For each level:

  1. pre-validation (read lock)
  2. parallel validation of the sub-batch of transactions by the virtual processor (no lock)
  3. post-validation and insertion (write lock)
  4. validation and insertion of unorphaned transactions

Handle the transactions of a block newly added to the DAG

Here again, a split in 3 steps adds granularity in the locks.

  1. handling of transactions (write lock),
  2. validation and insertion of unorphaned transactions
  3. expiring the low level transactions (write lock)

Revalidate high priority transactions

This is probably the most important bottleneck alleviation, together with tx relay broadcasting. This process occurs only every 30 seconds and for a node getting a very high rate of local transactions, it was implying a very computer intensive processing all under a single atomic lock of the mempool.

This gets redesigned as:

  1. getting all the high priority transactions (read lock)
  2. processing the batch in a topological order by level of chained dependency, instead of a "standard" Kahn's in-degree algorithm (new behavior)
  3. populating all transactions with UTXO entries found in the mempool (read lock)
  4. parallel validation of the sub-batch of transactions by the virtual processor (no lock)
  5. updating the transactions in the mempool and removing the invalid ones, along with all their redeemers (write lock)

Since the process is now much more granular in how it locks both the mempool and the consensus, it may happen that some high priority transactions present at the beginning got removed (mined in a block, invalidated, etc.) during the (relatively long) process. The transaction id may or may not be returned as accepted depending on the sequence of events. But no impact is expected on the node. It will simply try to rebroadcast a transaction id that is no longer present and the peers requesting it will eventually get a TransactionNotFound, which may and does occur anyway for other reasons.

The behavior changes compared to the golang version in that on transaction validation error the error is simply logged and the execution continues whereas in golang the execution halts returning the error.

Performance gains

Cautious note: an upcoming PR will add a benchmarking infrastructure making it possible to measure the gains in performance, hence revealing to which extent this new design actually alleviates the bottleneck issue.

Depending on the measures, some further adjustments might be needed.

…s while revalidating high priority transactions
mining/src/lib.rs Show resolved Hide resolved
mining/src/mempool/handle_new_block_transactions.rs Outdated Show resolved Hide resolved
mining/src/mempool/handle_new_block_transactions.rs Outdated Show resolved Hide resolved
mining/src/monitor.rs Outdated Show resolved Hide resolved
mining/src/monitor.rs Outdated Show resolved Hide resolved
…apsulation logic since collections can be completely modified externally; while in tx pools it is important to make sure various internal collections are maintained consistently (for instance the `ready_transactions` field on `TransactionsPool` needs careful maintenance)
…ns in the case `remove_redeemers=false`. This is already done via `remove_transaction_from_sets` -> `transaction_pool.remove_transaction`. + a few minor changes
…s happens as part of:

`remove_from_transaction_pool_and_update_orphans` -> `orphan_pool.update_orphans_after_transaction_removed` -> `orphan_pool.remove_redeemers_of`
@michaelsutton michaelsutton merged commit a59214e into kaspanet:master Oct 6, 2023
6 checks passed
smartgoo pushed a commit to smartgoo/rusty-kaspa that referenced this pull request Jun 18, 2024
* Split mempool atomic validate and insert transaction in 3 steps

* Process tx relay flow received txs in batch

* Use a single blocking task per MiningManagerProxy fn

* Split parallel txs validation in chunks of max block mass

* Abstract expire_low_priority_transactions into Pool trait

* Making room in the mempool for a new transaction won't remove chained txs nor parent txs of the new transaction

* Refine lock granularity on Mempool and Consensus while processing unorphaned transactions (wip)

* Fix failing test

* Enhance performance & refine lock granularity on Mempool and Consensus while revalidating high priority transactions

* Comments

* Fix upper bound of transactions chunk

* Ensure a chunk has at least 1 tx

* Prevent add twice the same tx to the mempool

* Clear transaction entries before revalidation

* Add some logs and comments

* Add logs to debug transactions removals

* On accepted block do not remove orphan tx redeemers

* Add 2 TODOs

* Fix a bug of high priority transactions being unexpectedly orphaned or rejected

* Refactor transaction removal reason into an enum

* Add an accepted transaction ids cache to the mempool and use it to prevent reentrance in mempool, broadcasting to and asking from peers

* Improve the filtering of unknown transactions in tx relay

* Enhance tx removal logging

* Add mempool stats

* Process new and unorphaned blocks in topological order

* Run revalidation of HP txs in a dedicated task

* Some profiling and debug logs

* Run expiration of LP txs in a dedicated task

* remove some stopwatch calls which were timing locks

* crucial: fix exploding complexity of `handle_new_block_transactions`/`remove_transaction`

* fixes in `on_new_block`

* refactor block template cache into `Inner`

* make `block_template_cache` a non-blocking call (never blocks)

* Log build_block_template retries

* While revalidating HP txs, only recheck transaction entries

* Fix accepted count during revalidation

* mempool bmk: use client pools + various improvements

* Improve the topological sorting of transactions

* Return transaction descendants BFS ordered + some optimizations

* Group expiration and revalidation of mempool txs in one task

* Refine the schedule of the cleaning task

* ignore perf logs

* maintain mempool ready transactions in a dedicated set

* Bound the returned candidate transactions to a maximum

* Reduces the max execution time of build block template

* lint

* Add mempool lock granularity to get_all_transactions

* Restore block template cache lifetime & make it customizable in devnet-prealloc feature

* Restore block template cache lifetime & make it customizable in devnet-prealloc feature

* Relax a bit the BBT maximum attempts constraint

* Refactor multiple `contained_by_txs` fns into one generic

* Test selector transaction rejects & fix empty template returned by `select_transactions` upon selector reuse

* Log some mempool metrics

* Handle new block and then new block template

* turn tx selector into an ongoing process with persistent state (wip: some tests are broken; selector is not used correctly by builder)

* use tx selector for BBT (wip: virtual processor retry logic)

* virtual processor selector retry logic

* make BBT fallible by some selector criteria + comments and some docs

* add an infallible mode to virtual processor `build_block_template()`

* constants for tx selector successful decision

* Add e-tps to logged mempool metrics

* avoid realloc

* Address review comments

* Use number of ready txs in e-tps & enhance mempool lock

* Ignore failing send for clean tokio shutdown

* Log double spends

* Log tx script cache stats (wip)

* Ease atomic lock ordering & enhance counter updates

* Enhance tx throughput stats log line

* More robust management of cached data life cycle

* Log mempool sampled instead of exact lengths

* avoid passing consensus to orphan pool

* rename ro `validate_transaction_unacceptance` and move to before the orphan case (accepted txs will usually be orphan)

* rename `cleaning` -> `mempool_scanning`

* keep intervals aligned using a round-up formula (rather than a loop)

* design fix: avoid exposing full collections as mut. This violates encapsulation logic since collections can be completely modified externally; while in tx pools it is important to make sure various internal collections are maintained consistently (for instance the `ready_transactions` field on `TransactionsPool` needs careful maintenance)

* minor: close all pool receivers on op error

* `remove_transaction`: no need to manually update parent-child relations in the case `remove_redeemers=false`. This is already done via `remove_transaction_from_sets` -> `transaction_pool.remove_transaction`. + a few minor changes

* encapsulate `remove_transaction_utxos` into `transaction_pool`

* no need to `remove_redeemers_of` for the initial removed tx since this happens as part of:
`remove_from_transaction_pool_and_update_orphans` -> `orphan_pool.update_orphans_after_transaction_removed` -> `orphan_pool.remove_redeemers_of`

* inline `remove_from_transaction_pool_and_update_orphans`

* remove redeemers of expired low-prio txs + register scan time and daa score after collection (bug fix)

* change mempool monitor logs to debug

* make tps logging more accurate

* import bmk improvements from mempool-perf-stats branch

* make `config.block_template_cache_lifetime` non-feature dependent

---------

Co-authored-by: Michael Sutton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants