Data model and canonicalization #8

clehner · 2021-04-09T20:23:52Z

The TypedData object for signing should be able to encode objects in the VC Data Model in a canonical way. I am seeing five possible ways to do it:

JSON-LD to RDF conversion, URDNA2015 canonicalization, and N-Quads serialization. This would adhere closest to how verification method types like Ed25519Signature2018 and JsonWebSignature2020 work, only instead of hashing the document and proof options before signing, the document and proof options are converted into TypedData and then signed. This can use a simple TypedData type structure to encode RDF statements. This has been implemented as shown in Verification method using EIP-712 decentralized-identity/ethr-did-resolver#107 (comment). This could be modified to avoid using TypedData arrays if needed, as discussed in (4) below. Applying JSON-LD to RDF conversion involves JSON-LD expansion, which means that the full IRI for each property in the document is included in the signing input. The resulting N-Quads in the signing window UI is more verbose than the input JSON-LD document, but could theoretically be copy-pasted by the user into other applications for further processing.
JSON Object TypedData. Either canonicalizing the document and proof in a single object as in JcsEd25519Signature2020, or separately canonicalizing them as specified in Linked Data Proofs. Since TypedData is more structured than JSON, even though it is represented using JSON in EIP-712 using JSON, a step is needed to convert the arbitrary JSON data into a TypedData structure. An object in TypedData must have a fixed set of fields, and there are no enums, only structs and arrays. I think a set of types could be defined to represent arbitrary JSON, using array properties to represent optional values. This approach would probably not look great in signing UI. An encoding for numbers would need to be specified, since TypedData does not have a type for floating point numbers; instead a struct should probably be used, canonically encoding the number either as a string or maybe with two uint256 values to represent a 64-bit float. Since values would need to be transformed from their original JSON into TypedData in a custom way, I think the only remaining canonicalization needed would be to sort the order of properties in objects.
JSON using static EIP-712 types. A common set of types can be defined for objects from the VC Data model such as VerifiableCredential, Proof, CredentialSubject, etc. This is more restrictive than the general JSON approach (2) or JSON-LD approach (1), since the types must conform to the EIP-712 structured data type system; e.g. there can be no "string or object" or "object or array of objects" like we find in the VC Data Model. A single set of TypedData types for VCs would not be able to fully represent the VC Data model, only a limited subset. This might look okay in signing UI, if the user is familiar with the VC Data model, as shown in the screenshot in Verification method using EIP-712 decentralized-identity/ethr-did-resolver#107 (comment).
JSON with dynamic EIP-712 types. This is what I think this specification currently implements. To more fully support the VC Data Model, the types must be specified dynamically, to allow for types that are not in the core VC Data Model, and/or variations on the core types (i.e. enums, such as Issuer that may be either a string or an object with an id string property). There is also still a limitation that arrays in TypedData must be of the same type, while in arbitrary JSON arrays may contain values of different types. To represent arrays containing values of multiple types (i.e. polymorphism), some additional conversion will be needed. This could seem to lead back to (2) above in order to represent arbitrary JSON values as TypedData values - but I think we could skip that by using structs instead arrays: each array of N values would be represented by a struct with N values (which may be of different types), where the property names are whole numbers. Avoiding TypedData arrays also has the benefit of not having to deal with the apparent inconsistency between eth-sig-util (MetaMask) and EIP-712 (signTypedData_v4 not according to specification MetaMask/eth-sig-util#106) in the encoding of arrays - which I think this specification should otherwise address. A verifier needs to have the type information to verify the signature; this suggests putting it in the proof. This specification currently does this, using the messageSchema property of the eip712Domain property of the proof. With the EIP-712 type information provided, I don't think there is need for canonicalization of the message being signed, since the EIP-712 type information defines the order of object properties, except for representing floating-point numbers for which some representation must be specified as in (2); I don't think the use of JCS in the current specification has an effect on the signing input.
JSON with canonical EIP-712 types. Similar to (4) but instead of including the EIP-712 type information in the proof object, generate the types from the signing input in a canonical way. i.e. traverse the input object and assign a struct type to each object based on the properties that the object contains. Canonicalization would be needed here, to visit the objects in a canonical order for assigning type names, such as by sorting property names in lexicographic order, and visiting objects depth-first. This removes the need for including schema information in the credential/proof in (4), allowing for more compact display in the signing window, and not having to define schema when adding or removing properties. The cost is more complexity in the implementation (but not as much as for JSON-LD).

Of the above five approaches, only 1, 2 and 5 can fully represent the VC Data Model. 2 cannot avoid using TypedData arrays which might be problematic. 1 is the only one that is based on linked data and uses information from JSON-LD contexts in the signing input.
I think if linked data is important, we should change this specification to use 1; otherwise I would suggest moving towards 5. Does this make sense?

The text was updated successfully, but these errors were encountered:

clehner · 2021-04-14T21:19:34Z

Pursuing option number 5 as opened in #9, while attempting to support linked data verification with #10

clehner mentioned this issue Apr 9, 2021

Verification method using EIP-712 decentralized-identity/ethr-did-resolver#107

Closed

clehner closed this as completed Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model and canonicalization #8

Data model and canonicalization #8

clehner commented Apr 9, 2021

clehner commented Apr 14, 2021

Data model and canonicalization #8

Data model and canonicalization #8

Comments

clehner commented Apr 9, 2021

clehner commented Apr 14, 2021