-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Execution state] complete ledger - Separate storage of payloads #1746
Comments
Do you have suggestions on how to achieve this? It seems to me like the only way to effectively do this is to have both a persistent store on the filesystem and a cache in memory, but then it would not consistently have no impact on those operations. It would have no impact only in ideal scenarios where the cache (LRU would be best) contains all of the values that we need to read, and that those values never need to be retrieved from the filesystem. If we ever need to retrieve a value from the filesystem in order to satisfy a read call, how could it have no impact? We would have to block the read call until we successfully fetched the value from disk, which is probably an operation that is a few orders of magnitude more costly in performance terms than a read on memory is. The most optimal way I've found is to have the LRU cache write on disk upon evicting values, and to regularly evict (and therefore, persist) its oldest entries, to hopefully never reach a full cache (which would mean blocking disk operations) and allow the disk writes to be done concurrently without interfering with new read/write operations, but that solution is limited by how much memory is devoted to the cache. The bigger the cache and the more often its oldest entries are purged, the better performance is to be expected, but even that does not solve the problem where a read call could come for any old key that is now on disk, and that would require a disk read which would inevitably be slow. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Problem Definition
Currently, we do keep payloads as part of the leaf nodes and in memory, if we store them on a disk-based data store we can separate the index (trie) from the actual data. and that should reduce memory usage drastically. should also improve garbage collection.
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.
The text was updated successfully, but these errors were encountered: