-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merkle/Causal structure #38
Comments
@gritzko , seeing into the picture, not understand, why reference of 4+b operation points to 1+a (not 3+b) ? |
@abalandin For |
@abalandin Ah, I see. One object's ops form a tree. That tree conveys the structure of the object and the causality of operations. |
May, or may not. Learning the type involves need of some lookup. Instead, we can build a system where object ids are namespaced by type ids. I. e. objects
may be considered as two distinct valid objects. Then de facto the full object id will be a pair (type, object-id). This is possible in all cases since when we work with an object, we always know its type.
This is false for name-based UUIDs in
No, if we account parts of an object as local to their object. So, if distinct objects have parts with the same local id, they are still located inside different objects. Hence, the global id of a specific part must simply include its object’s id. Incorporating type, as said above, the global is of an object part is a triple (type, object-id, location). On the other hand, if we consider location-ids global and try to calculate object-ids from them, we have to build enormous tables of all parent-child UUID relations. But what problems does it solve? What do you mean by vice versa in this case?
Why?
But you haven’t described the problem. |
Two issues here: (1) keys become twice bigger (2) that contradicts the idea of a globally-unique identifier (even inside a database, it is scoped by the type). Given that I want to add hashes, UUID(type)+UUID(object)+UUID(version)+hash becomes way too heavy. |
True. As is, |
True. My goal is to have a nice Merkle structure. |
In example, there are 2 different formats for lww ops. I see them as create-field and update-field. It is too complex. They may be converged into one
This comment is a bit aside of the issue topic, but it is written as a proof that the idea may work. |
About a year ago, I had to introduce the formal grammar cause the thing had to work inside a database. Now, I am introducing a Merkle tree, so it becomes even more strict. Like, bitwise strict. So, I am struggling with immutability right now. RON 2.0 is logically-immutable, but in fact ops change. Two key issues are:
I think I mostly solved (1), not sure about (2)... |
How to deal with headers: equate an object-create operation and a state frame header. We can't discard that operation anyway. So, object creation is
Then, any header-less chunk counts as a patch. Now, we only use immutable ops as-is, no "synthesized" headers. |
A patch:
This open-RON notation is mostly backwards-compatible (Except 2.0 uses I think, I also have a solution for To summarize that, ref UUIDs are more strictly defined now, so we can build on them more easily. So far, I like 2.1. |
The notation is backwards-compatible, the RDT encoding is not. |
RON ops certainly have some redundancy. The op specifier consists of four UUIDs: type, object, event and location ids, which may be derived from each other in many cases.
Indeed, having the object id, we may learn the type; having the location id we may learn the object (as location ids are globally unique).
This redundancy may lead to contradictions. Indeed, we may specify a wrong type for the object or a wrong object for the location (or vice versa).
While a detailed four-component specifier makes it easy to process an op, it also makes it difficult to handle a RON database as a Merkle structure.
My solution is to continue the Causal Tree line of thinking all the way forward. Namely, make object and type ids 100% derived. Use two versions of the protocol: Open RON (two-UUID specifier, event and ref) and Closed RON (four-UUID specifier for client's convenience). Having the full history of events, a peer/server may transitively close ops, deriving the closed form for the clients (who don't have the full history).
This way, the protocol is much less vulnerable to data inconsistencies. Also, the Merkle structure of the event graph becomes self- evident. Last but not least, a RON log gets a simpler key-value structure which is much easier to store and navigate.
Formally, the new rules are:
This way, ops form an orderly tree. The existing types and algorithms need to be corrected in two cases (at least):
Otherwise, the overall structure of the protocol stays the same; all the reducer/RDT logic stays the same as reducers did not read type/object ids anyway. The existing grammar stays valid.
In a follow up, I'll describe how the Merkle tree is defined and used in this context.
The text was updated successfully, but these errors were encountered: