Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving efficiency when storing composite values (specially nested ones) #1854

Closed
Tracked by #1744
ramtinms opened this issue Jul 28, 2022 · 8 comments
Closed
Tracked by #1744

Comments

@ramtinms
Copy link
Member

ramtinms commented Jul 28, 2022

Issue To Be Solved

Currently, we initiate a Atree dictionary for all composite values, this would result in the storage of composite types as independent registers. Initial numbers from the mainnet snapshot shows that from 139,031,997 composite types about 60% of them only include fields which only require a fixed size for storage and are not expandable. For example in this context, string and unit are expandable in size, but uint64 is not.

Here is an example from nbatopshot

pub resource Collection {
	pub var ownedNFTs: @{UInt64: NonFungibleToken.NFT}
}

pub resource NFT {
        pub let id: UInt64
        pub let data: MomentData
}

pub struct MomentData {
        pub let setID: UInt32
        pub let playID: UInt32
        pub let serialNumber: UInt32
    }

Right now for a single moment stored inside a collection, current model results into 4 Atree dictionaries (4 registers):

  • one for storing Collection (which includes only a reference to another atree dictionary for ownedNFTs)
  • one for ownedNFTs field
  • one for the NFT
  • one for the MomentData

Suggested Solution

Option one (an optimization):
Considering fields can not be extended currently for composite types, an optimization to reduce the number of registers and the depth of storage trie is to inline these composite types into the parent container. This can be done by changing the way we encode these composite types and what we return as part of Storable to the atree.

following the example, MomentData can be stored as part of the NFT and NFT itself can also be stored as an in-lined value inside the ownedNFTs Atree dictionary. besides the benefit of reducing the number of registers, it also would reduce the need for pointers to be stored and reduce the overhead of pointers.
making a distinction between the composite types that would only take a fixed small size of storage and store them inline.

Option two (more general solution):
If we always copy resource on changes, we could benefit from the same trick we did for strings and other mutable values, If the size is bigger than X store it separately, if the size is smaller than X store it in-line. This way we don't care about the fields and only if they are growing bigger than X. Though this is tricky to do, we need to make sure when a resource is loaded we never change things in place which would cause issue, in case we need to update the parent we need to remove the old...

@ramtinms
Copy link
Member Author

The reporter I used to generate the numbers is here

@ramtinms ramtinms changed the title Inline storage of composite values with immutable and constant size fields Resolving inefficiency in storing composite values Jul 28, 2022
@ramtinms ramtinms changed the title Resolving inefficiency in storing composite values Improving efficiency when storing composite values (specially nested ones) Jul 28, 2022
@bluesign
Copy link
Contributor

Considering fields can not be extended currently for composite types

I disagree with this consideration, this leads to bad outcomes. We should consider "how system should be" not "how system currently is". If we continue our design with assumptions of "not possibility of things currently", then those will create a feedback loop, and will never be possible.

@bluesign
Copy link
Contributor

Improving efficiency when storing composite values

Also I think here we need to make clear, what are we optimizing for. In this context I am guessing number of registers.

What about number of register changes? I remember hearing about atree design consideration was minimizing data sent on register changes.

@ramtinms
Copy link
Member Author

ramtinms commented Jul 29, 2022

@bluesign thanks for your fast replies on this, really appreciate it.

You're very right, we need to know what would be the path for composite type extension (if it's going to be a wrapper style it would stay still compatible). But that's what I was hoping the cadence team can have some input on.

What is suggested here as option 1 is targeting minimizing the number of registers and total bytes that are touched (read). For Atree design we also considered the same priority, total bytes read has higher priority to the delta sizes. But why is that: From the protocol perspective we only transfer register reads to the verification nodes and deltas are computed by verification nodes and since the parent is anyway read it won't have an impact on the chunk data pack size. and reduce in the number of registers would result in way more saving on the proof size that is transferred. But if there are concerns about the delta sizes from other perspectives let's look at them. We could also look at the access patterns, from my limited look at things, its seems these small constant size values are most of the time untouched and the majority of operations are actually on the higher level and ownership changes, which might not actually cause that much extra delta in this case.

Context: Right now the most pressing issue is the depth of the trie and number of registers used, which results in very large proof sizes and taking a good chunk of memory with intermediate nodes.

@bluesign
Copy link
Contributor

@ramtinms composite type extension is really important subject (at least for me), it is a bit bottleneck now, everyone is complaining, though it can be managed somehow even with these options.

Correct me if I am wrong, but Option 1 seems like pre atree system. Option 2 seems very nice.

Btw I believe your problem is at the proof not how the things are stored. I feel like I am too much a noob to comment here on this ( and make a fool of myself ) Also hard to explain without drawing somewhere. If you have some time next week, maybe we schedule a call ?

@ramtinms
Copy link
Member Author

ramtinms commented Aug 2, 2022

@bluesign lets wait for when @turbolent is back, so we could setup a call to talk about both angels.

@j1010001
Copy link
Member

Can be close once we complete #2809

@fxamacker
Copy link
Member

This is resolved by Atree PR 342 and integrated by Cadence PR #2882.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants