[Execution state] complete ledger - Separation of node types (leaf, intermediate) #1745

ramtinms · 2021-12-09T04:24:53Z

The current implementation of the complete ledger re-uses the same node structure type for both leaf and intermediate nodes. Separating the node types into two structures should improve memory consumption because of the way go handles struct alignment (memory used). For leaf nodes, we don't need the pointers to right and left children and for intermediate nodes, we don't need a pointer for payloads and paths.
The current code has already support for different construction methods for node creation so that should make it easier to separate the node types, though checkpointer right now uses a default constructor and might need some workaround.

Potential challenges

[backward compatibility] the updated code should be able to read the checkpoints that were generated by the current version of the code.

Ullaakut · 2021-12-21T09:46:59Z

Hi @ramtinms!

Regarding this task

My proposal to implement this task is the following:

Add a Node interface which is satisfied by both branches and leaves
Make the trie have a root Node and branches have Node elements as children
When it is needed to know which type a Node is, it can be either type-cast or if we do not need to access its specific attributes/methods, we can assert the presence of a path for example.
As for the checkpointer, it could simply return the Node interface when rebuilding a node, and effectively returning either NewBranch or NewLeaf depending on the underlying node type.

The potential drawback with this is that while it improves clarity and maintainability significantly, it comes at a potentially small but non-negligible cost in performance. I could run benchmarks to quantify the impact that this has if you think it is necessary. AFAIK, the overhead is very minimal, but I assume that every thousandth of a nanosecond counts, for such a critical component as the ledger trie.

When it comes to the memory usage savings, we can already compute an approximation of what it would be.
Each branch node would save 32 bytes for the ledger.Path, which is not currently a pointer but always an empty fixed-length 32 byte array; as well as the 8 bytes from the pointer, which is then 40 bytes per branch.
Each node would save 16 bytes for the two unnecessary pointers.

Since in the mainnet checkpoint there are currently ~150 million nodes, about 70 million of which are leaves and 80 are branches (if I recall correctly), that would result in about 1.2GB of savings for the leaves and 3.2GB of savings for branches, for a total of 4.4GB of savings over ~30GB of raw data, so an expected improvement of around 15%.

Is that what you had in mind?

Extension Node proposal

Experimental results show that when using extension nodes with the mainnet-14 checkpoint, the size of the trie after rebuilding it from the checkpoint is reduced by around 30%, going from 14.5GB (without payloads) to 10.3GB. The process of trimming the trie however, which is done by essentially going through all of the leaves of the original trie and recreating a new one from them, turns out to be quite slow. On the machine I use for testing, it takes about 30mns to read the checkpoint and rebuild the nodes, but then it takes 5 to 6 hours to trim the resulting trie.

Once the new trie is ready however, inserting and reading values should be slightly faster, since there are less nodes to traverse to reach leaves. I think it's a tradeoff that might be worth discussing, WDYT?

fxamacker · 2021-12-21T19:17:49Z

The potential drawback with this is that while it improves clarity and maintainability significantly, it comes at a potentially small but non-negligible cost in performance. I could run benchmarks to quantify the impact that this has if you think it is necessary.

Hi Brendan, I agree there might be a potential penalty and we probably should run benchmarks.

At a glance, performance of interface function calls vs non-interface function calls appear to be similar if the function isn't inlined. The performance hit is larger when the function can be inlined. Since Go 1.17 can inline some functions that couldn't be inlined with 1.16, maybe both versions should be used in benchmarks.

Very preliminary benchmarks using the example code from the Stack Overflow link showed:

https://go.dev/play/p/E9H_4K2J9-

// when function can be inlined, performance hit on interface is larger
BenchmarkIntmethod-4       	428540371	         2.859 ns/op	       0 B/op	       0 allocs/op
BenchmarkInterface-4       	388212188	         3.089 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeSwitch-4      	512972532	         2.336 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeAssertion-4   	495534915	         2.416 ns/op	       0 B/op	       0 allocs/op

// when function cannot be inlined, performance hit on interface is smaller
BenchmarkIntmethod-4       	385235356	         3.088 ns/op	       0 B/op	       0 allocs/op
BenchmarkInterface-4       	388532904	         3.089 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeSwitch-4      	387217621	         3.088 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeAssertion-4   	388183740	         3.088 ns/op	       0 B/op	       0 allocs/op

When benchmarks were run with count=20, the not-inlined calls averaged 3.09 ns/op for all 4 benchmarks.

name             old time/op    new time/op    delta
Intmethod-4        2.84ns ± 1%    3.09ns ± 0%   +8.99%  (p=0.000 n=20+19)
Interface-4        3.09ns ± 0%    3.09ns ± 0%   +0.07%  (p=0.045 n=18+20)
TypeSwitch-4       2.33ns ± 0%    3.09ns ± 0%  +32.52%  (p=0.000 n=20+20)
TypeAssertion-4    2.42ns ± 0%    3.09ns ± 0%  +27.44%  (p=0.000 n=19+20)

Benchmarks were run on Haswell CPU and compiled with Go 1.16.

github-actions · 2024-10-15T02:03:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

fxamacker mentioned this issue Jan 5, 2022

[Execution state] - Checkpointer and Flattener speed and memory improvements #1750

Closed

fxamacker assigned fxamacker and unassigned fxamacker Feb 11, 2022

fxamacker self-assigned this Mar 21, 2022

fxamacker added Execution Cadence Execution Team Performance labels Jun 16, 2022

j1010001 mentioned this issue Apr 18, 2023

[Execution State] Memory usage and performance optimisation #1744

Open

7 tasks

github-actions bot added the Stale Label used when marking an issue stale. label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Execution state] complete ledger - Separation of node types (leaf, intermediate) #1745

[Execution state] complete ledger - Separation of node types (leaf, intermediate) #1745

ramtinms commented Dec 9, 2021

Ullaakut commented Dec 21, 2021 •

edited

Loading

fxamacker commented Dec 21, 2021

github-actions bot commented Oct 15, 2024

[Execution state] complete ledger - Separation of node types (leaf, intermediate) #1745

[Execution state] complete ledger - Separation of node types (leaf, intermediate) #1745

Comments

ramtinms commented Dec 9, 2021

Ullaakut commented Dec 21, 2021 • edited Loading

Regarding this task

Extension Node proposal

fxamacker commented Dec 21, 2021

github-actions bot commented Oct 15, 2024

Ullaakut commented Dec 21, 2021 •

edited

Loading