Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Add DagPB spec #201

Merged
merged 7 commits into from
Oct 9, 2019
Merged

Add DagPB spec #201

merged 7 commits into from
Oct 9, 2019

Conversation

vmx
Copy link
Member

@vmx vmx commented Oct 2, 2019

This PR combines #114 and #115 and addresses all code review comments from there.

I then changed the section about pathing to describe the current state in the Go and JS implementations. As this is a descriptive spec, I find this a sensible approach.

Of course it would be desirable to harmonize how pathing works between implementations, but that hasn't happened for a long time and I don't see this happening soon. Hence I find having the current state document is better than keeping discussing it without taking action.

whyrusleeping and others added 6 commits October 2, 2019 12:11
1. Fix link/hash encoding (not CBOR).
2. Describe a bunch of options for pathing.
3. Expand on canonical encoding.
Rename DAG-PB to DagPB so that it is consistent with DagCBOR.

## Pathing

There is some overlap between the Go and JavaScript implementation of DagPB. Both support pathing with link names: `/<name1>/<name2>/…`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we note (agree) that this is a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth linking to #55 as a "discussion of this issue and how it might be resolved".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien We can totally agree on that. Let's discuss it at #55 (and I'll add a link to it in here).

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 if you at least insert the spec status line at the top, will leave you to figure out how to handle the JS/Go discussion bit.

// An IPFS MerkleDAG Link
message PBLink {

// binary CID (with no multibase prefix) of the target object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? You can have v1 CIDs as DAGLinks which can contain a multibase prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't addressed this as I also don't know (@Stebalien probably does).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIDs don't contain a multibase prefix.

I created a script to verify this (yes, I am using a sledgehammer to crack a nut, but it gave me the chance to finally use the new js-ipld-dag API as well as playing a bit with go-ipld). The output is:

$ node --experimental-modules combined.js
(node:985) ExperimentalWarning: The ESM module loader is experimental.
actual hash:   89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
cidv0:     122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
cidv1: 0170122089b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8

And here's the script in case someone wants to run it locally:

import dagPb from 'ipld-dag-pb'
const { DAGLink, DAGNode} = dagPb
import Pbf from 'pbf'

// Creating the data

const node = new DAGNode('some data')

// base58btc - cidv0 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
const cidv0 = 'QmXc9raDM1M5G5fpBnVyQ71vR4gbnskwnB9iMEzBuLgvoZ'
const linkv0 = new DAGLink('cidv0link', 10, cidv0)
node.addLink(linkv0)

// base32 - cidv1 - dag-pb - sha2-256-256-89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8
const cidv1 = 'bafybeiejwhmzk7zep3qpxktaojv6jyjy72myhrmrtxwbug4kipt62zijza'
const linkv1 = new DAGLink('cidv1link', 11, cidv1)
node.addLink(linkv1)

//console.log(node)
const serialized = node.serialize()
//console.log(serialized)


// Reading the data (generated by pbf)

// PBLink ========================================

const PBLink = {};

PBLink.read = function (pbf, end) {
    return pbf.readFields(PBLink._readField, {Hash: null, Name: "", Tsize: 0}, end);
};
PBLink._readField = function (tag, obj, pbf) {
    if (tag === 1) obj.Hash = pbf.readBytes();
    else if (tag === 2) obj.Name = pbf.readString();
    else if (tag === 3) obj.Tsize = pbf.readVarint();
};

// PBNode ========================================

const PBNode = {};

PBNode.read = function (pbf, end) {
    return pbf.readFields(PBNode._readField, {Links: [], Data: null}, end);
};
PBNode._readField = function (tag, obj, pbf) {
    if (tag === 2) obj.Links.push(PBLink.read(pbf, pbf.readVarint() + pbf.pos));
    else if (tag === 1) obj.Data = pbf.readBytes();
};


const pbf = new Pbf(serialized)
const data = PBNode.read(pbf)
console.log('actual hash:   89b1d9957f247ee0fbaa60726be4e138fe9983c5919dec1a1b8a43e7ed6509c8')
const cidv0Read = data.Links[0].Hash;
console.log('cidv0:    ', cidv0Read.toString('hex'))
const cidv1Read = data.Links[1].Hash;
console.log('cidv1:', cidv1Read.toString('hex'))

Copy link
Member

@achingbrain achingbrain Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https:/multiformats/cid#how-does-it-work

<multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>

CIDv0 does not contain a multibase prefix so it's assumed to be base58btc, CIDv1 does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain yes, and it also mentions there:

NOTE: Binary (not text-based) protocols and formats may omit the multibase prefix when the encoding is unambiguous.

And this is what dag-pb is doing. The CID (no matter which one) is stored in binary without the multibase prefix (as the code above shows).

Copy link
Member

@achingbrain achingbrain Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, well presented.

You only need to include the multibase if the encoding is ambiguous - e.g. a CID encoded as a string. A buffer or similar octet stream is not ambiguous so it can be omitted.

Since that's the case, you could probably remove the (with no multibase prefix) altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that's the case, you could probably remove the (with no multibase prefix) altogether.

It's not strictly needed, but I think it adds clarity.

```

The objects link names are specified in the 'Name' field of the PBLink object.
All link names in an object must either be blank or unique within the object.
Copy link
Member

@achingbrain achingbrain Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Name field is optional, maybe say it must either be omitted or unique within the object?

The objects link names are specified in the 'Name' field of the PBLink object.
All link names in an object must either be blank or unique within the object.

## Pathing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think discussing differences between implementations makes for a good spec. If you were implementing this, what would you implement here?

Maybe just standardise on the JS version since it lets you navigate non-named links and the go version doesn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per https:/ipld/specs/pull/201/files/d7f528343fa4b051a7ab1042013a4830a8f090a5#r330630459, I'll add a link to the bug. But I think it's better to know how things are currently implemented, rather than not having this documented at all.


## Format

The DagPB IPLD format is a legacy format implemented with a single protobuf.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's fair to call this legacy (yet) as it's what pretty much all IPFS data is encoded with and until UnixFSv2 arrives there's no way to use anything else.

@vmx
Copy link
Member Author

vmx commented Oct 4, 2019

A new version is up. I made the spec "Descriptive: Draft" as we still have issue #55 open (in a final spec I would expect that nothing will change). For those who have already approved that spec, as thumbs up on this comment would be nice. If you can't be bothered, I'll merge it after 2 working days after @achingbrain approved it.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still lgtm, but that "with no multibase prefix" thing seems wrong, would be good to get an authoritative answer on that, the code looks to me like there's no trimming of the raw CID byte array

@vmx
Copy link
Member Author

vmx commented Oct 9, 2019

There were only minor changes since some of the approvals, so if you feel like they weren't good, please open an issue/PR. I'll merge this now :)

@vmx vmx merged commit 0cc83a7 into master Oct 9, 2019
@vmx vmx deleted the dag-pb-spec-vmx branch October 9, 2019 08:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants