Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add non-dag json codec #152

Merged
merged 3 commits into from
Mar 23, 2021
Merged

add non-dag json codec #152

merged 3 commits into from
Mar 23, 2021

Conversation

willscott
Copy link
Member

Would this acceptable as a first-pass at the 'non-dag' json?

@willscott willscott requested a review from warpfork March 22, 2021 19:52
@willscott
Copy link
Member Author

and follow-up question is if you have any concerns with doing the same thing (a parallel variant that skips the link logic) for dagcbor, @warpfork

@@ -17,13 +17,23 @@ var (

func init() {
multicodec.RegisterEncoder(0x0129, Encode)
multicodec.RegisterEncoder(0x0200, Encode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to think that for the json codec, if we encounter a Link in the middle of a structure, we should have the codec error explicitly. It's not going to roundtrip via that same codec, and explicit errors are better than unstated incorrect behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it produces a reasonable serialization into json of a dag. making the caller traverse the dag to understand there's no links in it, rather than just flatting them isn't clear to me as a win

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think callers generally have to do anything. They would just... get an error if they try to encode data that's unencodable with this format.

In general, things that are unencodable should result in errors. That's a rule we'd follow anywhere[‡] that isn't JSON, and I think we should follow it for JSON too.

Failure to have bijections for data that we emit tends to cause unhappiness and confusion. The problems are magnified by the fact they often sweep under the rug for a while and then often reappear at points in time distant from when the data was first written.

I'd say the direction of burden-of-proof should be overwhelming weighted such that doing something looseygoosey that's non-bijective is the case that needs support. Maintaining bijectiveness, even if it comes at the cost of more errors, should be the default design bias.

[‡] - except where floating point numbers are involved. But that's the exception that really proves the reason for the rule, isn't it -- float consistency is a horrific minefield that we would deeply prefer didn't exist, and there are horrific amounts of time wasted by this seemingly small problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warpfork my understanding was that people should be able to take DAG-JSON data and by changing the codec to JSON they'd just have an object where there are no links. This is similar to how we can take an existing CID and slap a Raw codec on it and get the data in that block back as raw bytes.

This case is mostly covered by the decode case, it seems a bit weird that I wouldn't be able to roundtrip data since Encode(Decode(dagJSONDataAsJSON)) would error. Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you decode as json you don't get a link. when you re-encode the map with a string in it you'd get the same data.

Copy link
Collaborator

@warpfork warpfork Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"/":"bafyquux"} is a map with one key in JSON. When you deserialize that block with a JSON codec, you get a map with one key; when you serialize it with a JSON codec, you get the same block.

(@aschmahmann , the reason I thumbs-down'd your comment is that there's no error in this case above. One just emits this as a map. That's normal.)

{"/":"bafyquux"} is a link in DAG-JSON. When you deserialize that block with a DAG-JSON codec, you get a link; when you serialize it with a DAG-JSON codec, you get the same block.

Now, I have a link, and I attempt serialize it with a JSON codec, I want that to yield an error, because that's not really encodable in JSON. As a user, if I attempt to do this, I have made a serious mistake, and I want to be informed immediately.

If we didn't do this, and instead smash the link into a map with one entry during the encode process, we've changed the data. When we deserialize it later with the same codec we serialized it with, we would get different data. This would not be good. And as a user, one might not notice what had happened until much later, making it very much not gooder.

I don't really care what happens when we serialize something with a JSON codec and attempt to later deserialize it with a DAG-JSON codec. If that gives different data... well, yes. A different codec was used. That's gonna happen.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may help to be reminded that neither DAG-JSON nor JSON are total supersets of each other. This sucks, and we often wish it wasn't so, but wishes... well. Wishes. They have limited power.

DAG-JSON can't encode maps with single entries that happen to have the key "/". JSON can. This makes JSON larger.

In the other direction, DAG-JSON can encode links. JSON can't. This makes DAG-JSON larger.

This is generally a headache. But I'm fairly certain we make the headache bigger rather than smaller if we were to add more silent coercions to the picture.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, just wanted to clarify what you were actually intending/trying to do. I have no problem with a codec's Encode function erroring when trying to convert an IPLD data model object to bytes when that codec doesn't support part of the data model used in that object (e.g. JSON doesn't support links).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAG-JSON can't encode maps with single entries that happen to have the key "/".

@warpfork maybe I'm missing it, but it doesn't look like DAG-JSON's Encode function errors with these map entries. Should that be fixed in another PR?

Copy link
Collaborator

@warpfork warpfork Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... 😩 yes, you're quite correct

Added an issue for this: #155

@warpfork
Copy link
Collaborator

Yeah, this seems mostly fine. In a perfect world there's a lot of DRY work I want to do between both this and the cbor codecs, but it would be silly to hold this up on that.

@warpfork
Copy link
Collaborator

Thoughts about adding a (very small) go-ipld-prime/codec/json package, which does the multicodec registrations, and has its own exported Encode and Decode function... which are just thin wrappers to the workhorse functions here?

The code is largely shared, but I think having a separate package per codec is feels clean and consistent and generally reasonable.

It would also make it possible to implement the codec/json and codec/cbor packages without a transitive dependency to linking/cid in the future, which could be nice (even though not presently enough of a priority to be worth the larger effort required).

@willscott
Copy link
Member Author

@warpfork is it okay to change the public Unmarshal method in dagjson? (see the newer commit and offer comments)

@warpfork
Copy link
Collaborator

I almost objected to that, but caught myself in time (there's a convention that Encode and Decode methods should take no additional parameters / be the same as what you'd get via multicodec table; there's _not a convention for Marshal and Unmarshal, and in fact I think those functions even disappear in dagjson2).

Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this.

There are some specks here and there I'd clean up in a bigger refactor (e.g. separate decoder state vs config; both are now present in one struct), but there's plenty of other things in that same heading, so I'm fine leaving these things on the wishlist until they can all be handled.

Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, hang on, where'd the conditional branch for encode end up?

@willscott
Copy link
Member Author

(i didn't do that yet 😁 )

@willscott willscott requested a review from warpfork March 23, 2021 01:59
Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@warpfork warpfork merged commit f02df08 into master Mar 23, 2021
@aschmahmann aschmahmann mentioned this pull request May 14, 2021
71 tasks
@mvdan mvdan deleted the codec/json branch August 13, 2021 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants