Reforming dynamic lists in SSZ #1160

vbuterin · 2019-06-10T14:16:02Z

Background reading: #1115

From my experience implementing SSZ partials (https:/ethereum/eth2.0-specs/blob/ssz-impl-rework/test_libs/pyspec/eth2spec/utils/ssz/ssz_partials.py), I've come to the conclusion that the fact that paths do not correspond to specific generalized indices (as generalized indices depend on the depth of a tree which for dynamic-sized lists is dynamic) leads to a large amount of complexity in the SSZ partials implementation. append and pop and the complexities around rebalancing lists around powers of 2 are particularly problematic.

My proposed solution to this is as follows. From the perspective of SSZ hashing, we remove lists, so the only data types are (i) base types, (ii) fixed-size vectors, (iii) containers (we could also add unions by hashing a Union[A, B... Z] identically to Container(a=Vector[A, 1], b=Vector[B, 1] ... z=Vector[Z, 1])).

All existing lists in the spec (validator list in the state, transaction lists in blocks...) get converted to fixed-size lists of some size (for lists in blocks, we know the maximums already; for the validator set we can pick a very generous limit, eg. 2**40). Hence, from the perspective of hashing, all data types become constant-sized types, and so the generalized index that a given path (eg. state -> state.validator_registry[123514].pubkey) corresponds to is always the same value.

However, in the SSZ vector data type, we now add a flag, "serialization type". To start off, the serialization types are FULL and DYNAMIC; a third one to be added later is SPARSE. A FULL vector is serialized like vectors are today; a DYNAMIC vector is serialized by serializing the items in the vector up until the highest nonzero item the same way that a list is serialized today. This makes DYNAMIC vectors essentially equivalent to current lists in terms of serialization, except that a maximum length must be specified. A SPARSE vector would be serialized by prepending the list of items with a (item, position) table of what all of the nonzero items are. Current vectors would become vectors with a FULL serialization type, current lists would become vectors with a DYNAMIC or possibly SPARSE serialization type (if DYNAMIC is used, then it may make sense to add a validation rule in those cases that verifies that there is no non-empty object that appears after any empty object in the list).

The text was updated successfully, but these errors were encountered:

dankrad · 2019-06-10T15:13:33Z

I would like to make an alternative proposal: Replace list by a type List[A, max_length], which is still a dynamic list type but with maximum length max_length.

Merkleization would be like a Vector but with the length mixed in (in other words: it is the hash tree root where on the left is a static vector filled with zeroes and on the right there is the length as a uint64)
Serialization is just like a List at the moment

The reason why I would propose this is that I think it is preferable that the type is aware of its current dynamic length. The code for this is much easier to write, reason about, and check statically and dynamically. It is also consistent with the other changes we are making to SSZ (like unions and bitlists) which go into the direction of externalising typing work into SSZ and making it work "naturally" as you would expect from a modern programming language.

vbuterin · 2019-06-10T17:52:06Z

Merkleization would be like a Vector but with the length mixed in (in other words: it is the hash tree root where on the left is a static vector filled with zeroes and on the right there is the length as a uint64)

I can see how this can be valuable. But there is one question: do we want length mixed in as an item at the top, or as an element of the list? Huffman theory would say that mixing length in at the top is only optimal if half of all queries to the list are queries of its length, but that seems implausibly high. It would also say that reserving the first element of the array to represent length is only optimal if a length query is only as common as a query to a randomly selected element, which seems implausibly low. Which of these two extremes is less bad? The former is better from the PoV of not changing things as much as possible, the latter is better from the PoV of simplicity.

dankrad · 2019-06-10T18:12:45Z

Huffman theory would say that mixing length in at the top is only optimal if half of all queries to the list are queries of its length, but that seems implausibly high.

I'm not sure, but if we assume that Merkle proofs typically access one element, and every proof hast to come with a proof of the length of the list, then actually half the accesses are to the length.
(We could skip this if we use zero-checking logic instead of checking the length, but this might need nasty special cases for lists where zero is a valid value. Not what I would prefer from a simplicity point of view)

vbuterin · 2019-06-10T18:37:46Z

and every proof hast to come with a proof of the length of the list

Ah, I suppose if we want our list get proofs to return what list[i] would actually return, which is IndexError if the index is out of bounds, then you're right. OK, I can see the rationale for mixing in at the top as in the status quo.

protolambda · 2019-06-12T22:39:04Z

See if we can describe the new extended SSZ better/more minimally (or at least my idea of it, all just an idea):

Any collection is defined as <header><body>

<body> is always defined as the concatenation of the serialized elements in the collection.

Bitfields require the <body> to be exactly bit-length bits, padded to the next multiple of 8. With the right-most byte left-padded to a full byte.
BigInts are the same as bitfields, but interpreted as little-endian integer.

There are 4 different types, specialized in <header> and hash-tree-root definition:

Known, header: `` (zero bytes), root: merkle(pad_right(elements, M))
Body size and elements location are known on compile time.
- fixed-size element containers
- fixed-size element full vectors
Dynamic, root: H(merkle(pad_right(elements, M)), len)
- Implicit, header: `` (zero bytes)
  Body size known from context (decoders use read bound)
  - fixed-size element dynamic vectors (a "list")
  - Bitfield (new)
- Explicit: <offset>*
  - variable-size element containers
  - variable-size element full vectors
  - dynamic vectors (a "list")
Sparse, root: merkle(pad_sparse(elements, indices, M)):
Indirectly: header: <<index><offset>>*:
- Sparse variable-size element vector (new)
Directly: header: <index>*
- Sparse fixed-size element vector (new)
Selected: header: <index>, root: H(H(element), index):
- Option/Union

* indicates zero or more elements, concatenated.

<index>, <offset> and <length> are always 4 byte little endian integers (i.e. uint32).

Offsets always count the bytes up to (but not including) the element.

M is defined as the power of 2 defined as collection limit.

merkle chunkifies (partition elements into 32 byte chunks) the input if it is of a basic type or a bitfield.
merkle computes the binary-merkle-root of the given elements.

pad_right(elements, M) pads the elements with zeroes to the given power of 2. **
pad_sparse pads the gaps and ends to match the given power of 2. **

**: Optimized away within merkleization in real-world implementations.

note on Sparse-Indirectly: we can do <index>*<offset>* too, if we want to merkle just part of the header (<index>*) efficiently for basic presence proofs (not quite like hard inclusion proofs still) constructed in limited contexts (contracts?).
note on <body>: encoders for bodies can be remixed with different header types to implement all the types easily:
- read by offsets
- read by stride (fixed increment per element)
- read by element size (variable increment per element).

dankrad · 2019-06-12T23:54:25Z

I'm very confused by your post... at least it does not feel like a simplification to me ;)
Have we defined sparse objects yet?
Also the bitfields is not how it works (we pad them on the left)

Dynamic, root: H(len, merkle(pad_right(elements, M)))

Mixing in the length is always on the right

protolambda · 2019-06-13T01:02:12Z

Have we defined sparse objects yet?

No, but I posted this serialization earlier in the chat, as it is consistent with offsets, and simple. But improvements are very welcome. (also see note on ordering of indices)

Also the bitfields is not how it works (we pad them on the left)

See edits, that is what I did previously. But that's not consistent with the current verify-bitfield behavior, which is like a little endian int. Also, a big little endian integer sounds more consistent too me. But considered big-endian for formatting/reading reasons (prefer it too, as there's no "gap" in the bits when you align a 9 bit integer in bytes)

Mixing in the length is always on the right

Whoops, wrote it a bit quick, but you get the idea

dankrad · 2019-06-13T15:25:53Z

But that's not consistent with the current verify-bitfield behavior, which is like a little endian int.

Oh yes, forgot it's little endian. Then it should be on the right.

ethers · 2019-06-20T16:27:07Z

I may add information here to help others who just arrive.

What is an SSZ Partial?

An SSZ partial is an object that can stand in for an SSZ object in any function, but which only contains some of the elements in the SSZ object. It also contains Merkle proofs that prove that the values included in the SSZ partial actually are the values from the original SSZ object; this can be verified by computing the Merkle root of the SSZ partial and verifying that it matches the root of the original SSZ object.

Source: https:/ethereum/research/tree/master/ssz_research/partials

JustinDrake · 2019-06-26T12:08:25Z

Addressed in #1180

schroedingerscode mentioned this issue Jun 11, 2019

SSZ v0.5.0 Related Changes apache/incubator-tuweni#19

Merged

JustinDrake added the scope:SSZ Simple Serialize label Jun 12, 2019

JustinDrake added the milestone:June 30 freeze 🥶 Phase 0 spec freeze for long-lived cross-client testnet label Jun 15, 2019

JustinDrake mentioned this issue Jun 15, 2019

Another possibility for hash_tree_root of dynamic lists #1115

Closed

JustinDrake closed this as completed Jun 26, 2019

cdetrio mentioned this issue Jul 20, 2019

Proposed structure for SSZ partials #1303

Open

protolambda mentioned this issue Nov 8, 2019

Sparse Merkle Trees (SMTs) designs #1472

Closed

franck44 mentioned this issue Feb 26, 2020

SSZ encoding/decoding for bit lists, bit vectors, recursive types. #1630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reforming dynamic lists in SSZ #1160

Reforming dynamic lists in SSZ #1160

vbuterin commented Jun 10, 2019 •

edited

Loading

dankrad commented Jun 10, 2019

vbuterin commented Jun 10, 2019

dankrad commented Jun 10, 2019

vbuterin commented Jun 10, 2019

protolambda commented Jun 12, 2019 •

edited

Loading

dankrad commented Jun 12, 2019 •

edited

Loading

protolambda commented Jun 13, 2019

dankrad commented Jun 13, 2019 •

edited

Loading

ethers commented Jun 20, 2019

JustinDrake commented Jun 26, 2019

Reforming dynamic lists in SSZ #1160

Reforming dynamic lists in SSZ #1160

Comments

vbuterin commented Jun 10, 2019 • edited Loading

dankrad commented Jun 10, 2019

vbuterin commented Jun 10, 2019

dankrad commented Jun 10, 2019

vbuterin commented Jun 10, 2019

protolambda commented Jun 12, 2019 • edited Loading

dankrad commented Jun 12, 2019 • edited Loading

protolambda commented Jun 13, 2019

dankrad commented Jun 13, 2019 • edited Loading

ethers commented Jun 20, 2019

What is an SSZ Partial?

JustinDrake commented Jun 26, 2019

vbuterin commented Jun 10, 2019 •

edited

Loading

protolambda commented Jun 12, 2019 •

edited

Loading

dankrad commented Jun 12, 2019 •

edited

Loading

dankrad commented Jun 13, 2019 •

edited

Loading