Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

A MerkleDAG Linked Web or IPLD proposal/idea #2

Closed
daviddias opened this issue Aug 28, 2015 · 2 comments
Closed

A MerkleDAG Linked Web or IPLD proposal/idea #2

daviddias opened this issue Aug 28, 2015 · 2 comments

Comments

@daviddias
Copy link
Member

In order to create a Web of information that survives over time, passing through generations of language idioms and primitives, we need a way to communicate effectively where information lives and how to interpret it.

This idea is not new and there has been several attempts to solve this problem, but to the added complexity, we always see the race to adoption fail, as it typically requires a full buy in from the developer, in order to leverage the advantages of SM/LD.

One recent attempt is JSON-LD, which takes on JSON, successful and well adopted data format used in the Web today and adds Linked-Data '@context', so that Linked-Data processors can infer the type of information and links present. One identified shortcoming of JSON-LD is its inability to coexist with normal JSON data, JSON-LD doesn't support any data that is not referenced in a given '@context', discarding that data if it passes through a JSON-LD processor.

Another issue is the present use of URL to store the schemas that describe the data. URLs are not eternal, they might disappear or the schemas might change location and since they rely on DNS, they require constant Internet access in order to understand the that that is given to us.

💡

What if we treat Linked Data in the same way we handle files, that is, we specify how the file/link is encoded, so that the decoder knows how to treat the link that will give it data.

foo: {
  '@multicodec': '/person/'
  name: 'Tim'
  age: 9000
}

This way, a processor of this object will know it will have to use a 'person' decoder, in order to make sense of this data.

Let's look now with an link:

foo: {
  '@multicodec': '/www/link/person/'
  '@value': 'http://someurl.com/person-1'
}

This tells us that our value is of type person, in a link inside the world wide web.

Now let's look if instead of using the www and http, if we used a content address filed system

foo: {
  '@multicodec': '/ipfs/link/person/'
  '@value': '/ipfs/QmbuH1ZExsQvzVEFFw9S2CivasHrQ9KmCy6zbxSymq8X5r/person-1'
}

Now our linked data processor would know that in order to fetch that object, it would have to use a IPFS, the content address filesystem.

This gives us the opportunity to link data that is not even in the Internet now, like books, articles, papers, that once uploaded or manually searched, can be part of our data structure. For e.g

foo: {
  '@multicode': '/book/'
  '@value': '/Alice's Adventures in Wonderland/chapter/Advice from a Caterpillar
}

What about the decoders? One of the benefits of Linked-Data is that once we can find the schema, we can make sense of the data because it tells us how to parse it. Well, with multicodec references, we can host the schemas on a Content Address File System, not liable by a single point of failure, that can host the schema and even the code necessary to decode that information. The way to find the decoder can be a simple 1:1 reference between the multicodec and its hash /person/ -> hash(/person/). So that they are always findable. It is like a package manager, but for data encoders/decoders.

One more thing, data structures might change over time and maybe what we consider to be a train or a ball might be not the same 10 years from now, so it is important to have versioning to enable data structures to evolve, like /ball/1.0.0.

This way, we ensure that:

  • we don't need to change current data structures, instead we just have to 'sprinkle' a lil bit of multicodec in top of links or new data structures we want to be self describable
  • we don't loose the decoders/encoders
  • links can be locally resolvable, without the requirement of centralized services such as DNS
  • we can reference data that is not even on the Internet

References:

@mildred
Copy link

mildred commented Aug 28, 2015

If you want to be completely agnostic of the JSON document, it's better not to alter it at all. JSON-LD provides this by linking to the context file using an out of band transmission: http://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld

The problem with adding a key to the JSON document is that the JSON document could already have a @multicode key that means something else. Or that by adding the @multicode key, you change its semantics.

For example, if we represents a directory using a JSON document, we would have key per file name. If we add the @multicode key, we would be adding a file names @multicode in the directory.

Or perhaps some JSON-LD parser out there depends on the fact that keys starting with @ are reserved for JSON-LD semantic. We would break this parser.

I really like the idea of self describing files, but I would prefer if the multicodec was transmitted outside of the JSON document. This could be by HTTP headers, or embedding the JSON document after a multicodec header.

@daviddias
Copy link
Member Author

The problem with adding a key to the JSON document is that the JSON document could already have a @multicode key that means something else. Or that by adding the @multicode key, you change its semantics.

You are right, I wish we could find a key that was not used by anything else, by parallel to @context, @type, @id, etc in JSON-LD, which might be used already by some JSON blobs, it is a matter of serving 99.999% of the scenarios, creating a 'good enough' solution.

For example, if we represents a directory using a JSON document, we would have key per file name. If we add the @multicode key, we would be adding a file names @multicode in the directory.

Not necessarily, we can have a multicodec for the unix file format, but we can also have a multicodec that specifies that a type of JSON blob is a directory of files and with that, only having a multicodec in the top level JSON object. The level of granularity of how data structures are defined and encoded is up to the user. Like storing things in a Hard Drive, in the beginning it is just a very long byte array, but once we have a pattern, we don't have to to specify each byte belongs to which format.

Or perhaps some JSON-LD parser out there depends on the fact that keys starting with @ are reserved for JSON-LD semantic. We would break this parser.

Good point, I wrote with '@' to leverage the bias we now have from JSON-LD that '@' is a key that will define the type of data, but we can use '#' or any other char for this matter.

I really like the idea of self describing files, but I would prefer if the multicodec was transmitted outside of the JSON document. This could be by HTTP headers, or embedding the JSON document after a multicodec header.

@jbenet also mentioned that idea and I'm also in favor of making fully self described data a 1st class citizen of a Self Describing Information System. The reason why I'm also in favor of having the option of having a kv that describes a JSON blob or a given remote link is for human readability and the ability to extend already made JSON api with a encoder/decoder for it. for e.g:

Imagine I have my.api.com/humans which till today it returns a list of humans, this API endpoint was designed without any notion of LD. Now I want to use that data in my app, but I know that the format stays the same, so I build the /humans/ codec which knows how to interpret that data, I can then have a link in my app that points to that url with a @multicodec: /humans/.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants