Add DagPB spec #201

vmx · 2019-10-02T10:23:33Z

This PR combines #114 and #115 and addresses all code review comments from there.

I then changed the section about pathing to describe the current state in the Go and JS implementations. As this is a descriptive spec, I find this a sensible approach.

Of course it would be desirable to harmonize how pathing works between implementations, but that hasn't happened for a long time and I don't see this happening soon. Hence I find having the current state document is better than keeping discussing it without taking action.

1. Fix link/hash encoding (not CBOR). 2. Describe a bunch of options for pathing. 3. Expand on canonical encoding.

Rename DAG-PB to DagPB so that it is consistent with DagCBOR.

Stebalien · 2019-10-02T15:50:57Z

block-layer/codecs/dag-pb.md

+
+## Pathing
+
+There is some overlap between the Go and JavaScript implementation of DagPB. Both support pathing with link names: `/<name1>/<name2>/…`.


Can we note (agree) that this is a bug?

Probably worth linking to #55 as a "discussion of this issue and how it might be resolved".

@Stebalien We can totally agree on that. Let's discuss it at #55 (and I'll add a link to it in here).

block-layer/codecs/dag-pb.md

rvagg

👍 if you at least insert the spec status line at the top, will leave you to figure out how to handle the JS/Go discussion bit.

achingbrain · 2019-10-03T09:03:22Z

block-layer/codecs/dag-pb.md

+// An IPFS MerkleDAG Link
+message PBLink {
+
+ // binary CID (with no multibase prefix) of the target object


Is this correct? You can have v1 CIDs as DAGLinks which can contain a multibase prefix.

I haven't addressed this as I also don't know (@Stebalien probably does).

https:/ipld/js-ipld-dag-pb/blob/c4b83cbb69821ca6da12190105b0e78d27688630/src/serialize.js#L23
https:/ipfs/go-merkledag/blob/2c1a891f6f1b2bbb6442b9da62e5ece591fbdad9/pb/merkledag.pb.go#L425

these both looks like raw CID byte arrays

CIDs don't contain a multibase prefix.

I created a script to verify this (yes, I am using a sledgehammer to crack a nut, but it gave me the chance to finally use the new js-ipld-dag API as well as playing a bit with go-ipld). The output is:

$ node --experimental-modules combined.js (node:985) ExperimentalWarning: The ESM module loader is experimental. actual hash: 89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8 cidv0: 122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8 cidv1: 0170122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8

And here's the script in case someone wants to run it locally:

import dagPb from 'ipld-dag-pb' const { DAGLink, DAGNode} = dagPb import Pbf from 'pbf' // Creating the data const node = new DAGNode('some data') // base58btc - cidv0 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8 const cidv0 = 'QmXc9raDM1M5G5fpBnVyQ71vR4gbnskwnB9iMEzBuLgvoZ' const linkv0 = new DAGLink('cidv0link', 10, cidv0) node.addLink(linkv0) // base32 - cidv1 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8 const cidv1 = 'bafybeiejwhmzk7zep3qpxktaojv6jyjy72myhrmrtxwbug4kipt62zijza' const linkv1 = new DAGLink('cidv1link', 11, cidv1) node.addLink(linkv1) //console.log(node) const serialized = node.serialize() //console.log(serialized) // Reading the data (generated by pbf) // PBLink ======================================== const PBLink = {}; PBLink.read = function (pbf, end) { return pbf.readFields(PBLink._readField, {Hash: null, Name: "", Tsize: 0}, end); }; PBLink._readField = function (tag, obj, pbf) { if (tag === 1) obj.Hash = pbf.readBytes(); else if (tag === 2) obj.Name = pbf.readString(); else if (tag === 3) obj.Tsize = pbf.readVarint(); }; // PBNode ======================================== const PBNode = {}; PBNode.read = function (pbf, end) { return pbf.readFields(PBNode._readField, {Links: [], Data: null}, end); }; PBNode._readField = function (tag, obj, pbf) { if (tag === 2) obj.Links.push(PBLink.read(pbf, pbf.readVarint() + pbf.pos)); else if (tag === 1) obj.Data = pbf.readBytes(); }; const pbf = new Pbf(serialized) const data = PBNode.read(pbf) console.log('actual hash: 89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8') const cidv0Read = data.Links[0].Hash; console.log('cidv0: ', cidv0Read.toString('hex')) const cidv1Read = data.Links[1].Hash; console.log('cidv1:', cidv1Read.toString('hex'))

https:/multiformats/cid#how-does-it-work

<multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>

CIDv0 does not contain a multibase prefix so it's assumed to be base58btc, CIDv1 does.

@achingbrain yes, and it also mentions there:

NOTE: Binary (not text-based) protocols and formats may omit the multibase prefix when the encoding is unambiguous.

And this is what dag-pb is doing. The CID (no matter which one) is stored in binary without the multibase prefix (as the code above shows).

Good point, well presented.

You only need to include the multibase if the encoding is ambiguous - e.g. a CID encoded as a string. A buffer or similar octet stream is not ambiguous so it can be omitted.

Since that's the case, you could probably remove the (with no multibase prefix) altogether.

Since that's the case, you could probably remove the (with no multibase prefix) altogether.

It's not strictly needed, but I think it adds clarity.

achingbrain · 2019-10-03T09:04:15Z

block-layer/codecs/dag-pb.md

+```
+
+The objects link names are specified in the 'Name' field of the PBLink object.
+All link names in an object must either be blank or unique within the object.


Since the Name field is optional, maybe say it must either be omitted or unique within the object?

achingbrain · 2019-10-03T09:09:54Z

block-layer/codecs/dag-pb.md

+The objects link names are specified in the 'Name' field of the PBLink object.
+All link names in an object must either be blank or unique within the object.
+
+## Pathing


I don't think discussing differences between implementations makes for a good spec. If you were implementing this, what would you implement here?

Maybe just standardise on the JS version since it lets you navigate non-named links and the go version doesn't?

As per https:/ipld/specs/pull/201/files/d7f528343fa4b051a7ab1042013a4830a8f090a5#r330630459, I'll add a link to the bug. But I think it's better to know how things are currently implemented, rather than not having this documented at all.

achingbrain · 2019-10-03T09:11:35Z

block-layer/codecs/dag-pb.md

+
+## Format
+
+The DagPB IPLD format is a legacy format implemented with a single protobuf.


I don't think it's fair to call this legacy (yet) as it's what pretty much all IPFS data is encoded with and until UnixFSv2 arrives there's no way to use anything else.

vmx · 2019-10-04T11:04:39Z

A new version is up. I made the spec "Descriptive: Draft" as we still have issue #55 open (in a final spec I would expect that nothing will change). For those who have already approved that spec, as thumbs up on this comment would be nice. If you can't be bothered, I'll merge it after 2 working days after @achingbrain approved it.

rvagg

still lgtm, but that "with no multibase prefix" thing seems wrong, would be good to get an authoritative answer on that, the code looks to me like there's no trimming of the raw CID byte array

vmx · 2019-10-09T08:41:12Z

There were only minor changes since some of the approvals, so if you feel like they weren't good, please open an issue/PR. I'll merge this now :)

whyrusleeping and others added 6 commits October 2, 2019 12:11

add a spec for dag-pb

b824496

dag-pb-spec: steb CR

5830eaa

1. Fix link/hash encoding (not CBOR). 2. Describe a bunch of options for pathing. 3. Expand on canonical encoding.

codecs: DagPB: it's about link names, not links

e96c792

codecs: rename DAG-PB to DagPB

4ba1cad

Rename DAG-PB to DagPB so that it is consistent with DagCBOR.

codecs: DagPB: Describe pathing of current implementations

1e252c0

codecs: DagPB: Move spec to correct location

d7f5283

vmx requested review from mikeal, Stebalien, achingbrain and whyrusleeping October 2, 2019 10:23

This was referenced Oct 2, 2019

add a spec for dag-pb #114

Closed

dag-pb-spec: steb CR #115

Closed

Stebalien approved these changes Oct 2, 2019

View reviewed changes

mikeal approved these changes Oct 2, 2019

View reviewed changes

rvagg reviewed Oct 3, 2019

View reviewed changes

block-layer/codecs/dag-pb.md Show resolved Hide resolved

rvagg approved these changes Oct 3, 2019

View reviewed changes

achingbrain reviewed Oct 3, 2019

View reviewed changes

codecs: DagPB: Address code review comments

172d786

rvagg approved these changes Oct 4, 2019

View reviewed changes

achingbrain approved these changes Oct 9, 2019

View reviewed changes

vmx merged commit 0cc83a7 into master Oct 9, 2019

vmx deleted the dag-pb-spec-vmx branch October 9, 2019 08:41

rvagg mentioned this pull request Jan 13, 2021

dag-pb: update, cleanup, make consistent, mark "Final" #347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DagPB spec #201

Add DagPB spec #201

vmx commented Oct 2, 2019

Stebalien Oct 2, 2019

rvagg Oct 3, 2019

vmx Oct 4, 2019

rvagg left a comment

achingbrain Oct 3, 2019

vmx Oct 4, 2019

rvagg Oct 4, 2019

vmx Oct 8, 2019

achingbrain Oct 8, 2019 •

edited

Loading

vmx Oct 8, 2019

achingbrain Oct 9, 2019 •

edited

Loading

vmx Oct 9, 2019

achingbrain Oct 3, 2019 •

edited

Loading

achingbrain Oct 3, 2019

vmx Oct 4, 2019

achingbrain Oct 3, 2019

vmx commented Oct 4, 2019

rvagg left a comment

vmx commented Oct 9, 2019


		## Pathing

		There is some overlap between the Go and JavaScript implementation of DagPB. Both support pathing with link names: `/<name1>/<name2>/…`.


		## Format

		The DagPB IPLD format is a legacy format implemented with a single protobuf.

Add DagPB spec #201

Add DagPB spec #201

Conversation

vmx commented Oct 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achingbrain Oct 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achingbrain Oct 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achingbrain Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmx commented Oct 4, 2019

rvagg left a comment

Choose a reason for hiding this comment

vmx commented Oct 9, 2019

achingbrain Oct 8, 2019 •

edited

Loading

achingbrain Oct 9, 2019 •

edited

Loading

achingbrain Oct 3, 2019 •

edited

Loading