Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Execution state] complete ledger - Separate storage of payloads #1746

Open
Tracked by #1744
ramtinms opened this issue Dec 9, 2021 · 2 comments
Open
Tracked by #1744

[Execution state] complete ledger - Separate storage of payloads #1746

ramtinms opened this issue Dec 9, 2021 · 2 comments
Labels
Stale Label used when marking an issue stale.

Comments

@ramtinms
Copy link
Member

ramtinms commented Dec 9, 2021

Problem Definition

Currently, we do keep payloads as part of the leaf nodes and in memory, if we store them on a disk-based data store we can separate the index (trie) from the actual data. and that should reduce memory usage drastically. should also improve garbage collection.

This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.

@Ullaakut
Copy link

[...] if we store them on a disk-based data store [...]
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.

Do you have suggestions on how to achieve this? It seems to me like the only way to effectively do this is to have both a persistent store on the filesystem and a cache in memory, but then it would not consistently have no impact on those operations. It would have no impact only in ideal scenarios where the cache (LRU would be best) contains all of the values that we need to read, and that those values never need to be retrieved from the filesystem. If we ever need to retrieve a value from the filesystem in order to satisfy a read call, how could it have no impact? We would have to block the read call until we successfully fetched the value from disk, which is probably an operation that is a few orders of magnitude more costly in performance terms than a read on memory is.

The most optimal way I've found is to have the LRU cache write on disk upon evicting values, and to regularly evict (and therefore, persist) its oldest entries, to hopefully never reach a full cache (which would mean blocking disk operations) and allow the disk writes to be done concurrently without interfering with new read/write operations, but that solution is limited by how much memory is devoted to the cache. The bigger the cache and the more often its oldest entries are purged, the better performance is to be expected, but even that does not solve the problem where a read call could come for any old key that is now on disk, and that would require a disk read which would inevitably be slow.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Label used when marking an issue stale. label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Label used when marking an issue stale.
Projects
None yet
Development

No branches or pull requests

2 participants