-
Notifications
You must be signed in to change notification settings - Fork 108
Conversation
1. Fix link/hash encoding (not CBOR). 2. Describe a bunch of options for pathing. 3. Expand on canonical encoding.
Rename DAG-PB to DagPB so that it is consistent with DagCBOR.
block-layer/codecs/dag-pb.md
Outdated
|
||
## Pathing | ||
|
||
There is some overlap between the Go and JavaScript implementation of DagPB. Both support pathing with link names: `/<name1>/<name2>/…`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we note (agree) that this is a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth linking to #55 as a "discussion of this issue and how it might be resolved".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien We can totally agree on that. Let's discuss it at #55 (and I'll add a link to it in here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 if you at least insert the spec status line at the top, will leave you to figure out how to handle the JS/Go discussion bit.
// An IPFS MerkleDAG Link | ||
message PBLink { | ||
|
||
// binary CID (with no multibase prefix) of the target object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? You can have v1 CIDs as DAGLinks which can contain a multibase prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't addressed this as I also don't know (@Stebalien probably does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIDs don't contain a multibase prefix.
I created a script to verify this (yes, I am using a sledgehammer to crack a nut, but it gave me the chance to finally use the new js-ipld-dag API as well as playing a bit with go-ipld). The output is:
$ node --experimental-modules combined.js
(node:985) ExperimentalWarning: The ESM module loader is experimental.
actual hash: 89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
cidv0: 122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
cidv1: 0170122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
And here's the script in case someone wants to run it locally:
import dagPb from 'ipld-dag-pb'
const { DAGLink, DAGNode} = dagPb
import Pbf from 'pbf'
// Creating the data
const node = new DAGNode('some data')
// base58btc - cidv0 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
const cidv0 = 'QmXc9raDM1M5G5fpBnVyQ71vR4gbnskwnB9iMEzBuLgvoZ'
const linkv0 = new DAGLink('cidv0link', 10, cidv0)
node.addLink(linkv0)
// base32 - cidv1 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
const cidv1 = 'bafybeiejwhmzk7zep3qpxktaojv6jyjy72myhrmrtxwbug4kipt62zijza'
const linkv1 = new DAGLink('cidv1link', 11, cidv1)
node.addLink(linkv1)
//console.log(node)
const serialized = node.serialize()
//console.log(serialized)
// Reading the data (generated by pbf)
// PBLink ========================================
const PBLink = {};
PBLink.read = function (pbf, end) {
return pbf.readFields(PBLink._readField, {Hash: null, Name: "", Tsize: 0}, end);
};
PBLink._readField = function (tag, obj, pbf) {
if (tag === 1) obj.Hash = pbf.readBytes();
else if (tag === 2) obj.Name = pbf.readString();
else if (tag === 3) obj.Tsize = pbf.readVarint();
};
// PBNode ========================================
const PBNode = {};
PBNode.read = function (pbf, end) {
return pbf.readFields(PBNode._readField, {Links: [], Data: null}, end);
};
PBNode._readField = function (tag, obj, pbf) {
if (tag === 2) obj.Links.push(PBLink.read(pbf, pbf.readVarint() + pbf.pos));
else if (tag === 1) obj.Data = pbf.readBytes();
};
const pbf = new Pbf(serialized)
const data = PBNode.read(pbf)
console.log('actual hash: 89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8')
const cidv0Read = data.Links[0].Hash;
console.log('cidv0: ', cidv0Read.toString('hex'))
const cidv1Read = data.Links[1].Hash;
console.log('cidv1:', cidv1Read.toString('hex'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https:/multiformats/cid#how-does-it-work
<multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>
CIDv0 does not contain a multibase prefix so it's assumed to be base58btc
, CIDv1 does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achingbrain yes, and it also mentions there:
NOTE: Binary (not text-based) protocols and formats may omit the multibase prefix when the encoding is unambiguous.
And this is what dag-pb is doing. The CID (no matter which one) is stored in binary without the multibase prefix (as the code above shows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, well presented.
You only need to include the multibase if the encoding is ambiguous - e.g. a CID encoded as a string. A buffer or similar octet stream is not ambiguous so it can be omitted.
Since that's the case, you could probably remove the (with no multibase prefix)
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that's the case, you could probably remove the
(with no multibase prefix)
altogether.
It's not strictly needed, but I think it adds clarity.
block-layer/codecs/dag-pb.md
Outdated
``` | ||
|
||
The objects link names are specified in the 'Name' field of the PBLink object. | ||
All link names in an object must either be blank or unique within the object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Name
field is optional, maybe say it must either be omitted or unique within the object?
The objects link names are specified in the 'Name' field of the PBLink object. | ||
All link names in an object must either be blank or unique within the object. | ||
|
||
## Pathing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think discussing differences between implementations makes for a good spec. If you were implementing this, what would you implement here?
Maybe just standardise on the JS version since it lets you navigate non-named links and the go version doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per https:/ipld/specs/pull/201/files/d7f528343fa4b051a7ab1042013a4830a8f090a5#r330630459, I'll add a link to the bug. But I think it's better to know how things are currently implemented, rather than not having this documented at all.
block-layer/codecs/dag-pb.md
Outdated
|
||
## Format | ||
|
||
The DagPB IPLD format is a legacy format implemented with a single protobuf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's fair to call this legacy (yet) as it's what pretty much all IPFS data is encoded with and until UnixFSv2
arrives there's no way to use anything else.
A new version is up. I made the spec "Descriptive: Draft" as we still have issue #55 open (in a final spec I would expect that nothing will change). For those who have already approved that spec, as thumbs up on this comment would be nice. If you can't be bothered, I'll merge it after 2 working days after @achingbrain approved it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still lgtm, but that "with no multibase prefix" thing seems wrong, would be good to get an authoritative answer on that, the code looks to me like there's no trimming of the raw CID byte array
There were only minor changes since some of the approvals, so if you feel like they weren't good, please open an issue/PR. I'll merge this now :) |
This PR combines #114 and #115 and addresses all code review comments from there.
I then changed the section about pathing to describe the current state in the Go and JS implementations. As this is a descriptive spec, I find this a sensible approach.
Of course it would be desirable to harmonize how pathing works between implementations, but that hasn't happened for a long time and I don't see this happening soon. Hence I find having the current state document is better than keeping discussing it without taking action.