Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPLD merkle-path improvements #62

Merged
merged 9 commits into from
Feb 12, 2016
Merged

Conversation

mildred
Copy link
Contributor

@mildred mildred commented Jan 8, 2016

Improves PR #37 and replaces PR #60. The idea is that there is only one kind of merkle-paths that are not to be confused with unixfs paths. These paths are powerful enough to be able to access multiple properties in IPLD objects and resolve merkle links.

@jbenet jbenet added the backlog label Jan 8, 2016
@mildred mildred mentioned this pull request Jan 8, 2016
5 tasks

The link will:

- look up the first object `QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k` that we call `root`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe call this one object0 instead of root?

@jbenet
Copy link
Member

jbenet commented Jan 9, 2016

@mildred thanks, this is a strong solution possibility.

(re the unixfs paths being different, /ipfs/ links on the gateway (which are sort of unixfs paths), for example, output the concatenated data of a file, instead of the raw objects, and have directory listings, and we cannot access children links of files there)

@mildred
Copy link
Contributor Author

mildred commented Jan 9, 2016

the unixfs paths being different

That's a part I missed and it created much confusion, sorry for that.

A _merkle-path_ is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and then follows _named merkle-links_ in the intermediate objects. Following a name means looking into the object, finding the _name_ and resolving the associated _merkle-link_.
A merkle-path is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and allows access of elements of the referenced node and other nodes transitively.

Merkle paths aren't suited to be used in filesystem representations (fuse mounts, HTTP or FTP protocols) as they describe the underlying IPLD data structure. Their use in filesystems is howver well suited for debug purposes (like `/proc` on unix).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we say this, do we mean:

  • these paths are not good to represent files because they allow access to the raw structures underneath
  • or, these paths are not good to be accessed via the web or the filesystems, because there are UI problems.

i do want to be able to use these paths via HTTP to inspect the underlying data structures. but it's ok if i cannot get a proper file or directory representation out of it.

the latter part of this statement makes me somewhat more at peace with any of:

  • disallowing access of link properties via the paths
  • disallowing transparent dereferencing without the use of $obj/link/
  • separating "the link" and "the link properties" (same as the previous one in a way)

i think what we really need here is more concrete datastructure examples, and see how the pathing would want us to get to the stuff. maybe this will show us which solution is the best and guide our thinking.

perhaps we could write:

  • an fs example (directories, sharded files, file attributes in dir entries)
  • a version control example (commit histories)
  • a social network example (users, messages, relationships) (maybe something like a simple version of foaf)
  • a crypto network example (keys, derived keys, signatures, certificates, attestations) (i'll write this one at least)
  • a blockchain example (block histories, transactions, etc)

(any other killer use case for ipld that we could model here?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is these paths are not good to represent files because they allow access to the raw structures underneath. That's why I took the analogy of the / proc filesystem which is not good for storing user files, but is still valuable as a filesystem. I might not have been clean enough, it's late at night in France and I need sleep :-)

There might be UI problems as the . separator is not understood as such by browsers for example. I don't think this will be a real problem though.

@jbenet
Copy link
Member

jbenet commented Jan 9, 2016

That's a part I missed and it created much confusion, sorry for that.

not at all, they look the same right now, which is definitely confusing. we could:

  • leave /ipfs/... for unixfs
  • use /ipld/... for raw ipld data.

not sure. in the end i still have trouble with the unixfs--ipld dichotomy. im not sure how to get unixfs to play well with other things (like commits). it is likely that:

  • we will have a 1 + N datastructure and 1 + N path world (1 for ipld and N for every datastructure)
  • or a 1 + N datastructure but 1 + 1 path world (1 for ipld and 1 for the derived datastructs together).

the latter is hard but if we can get it it would be much easier to reason about. will be very difficult to think of different datastruct-specific paths :/

@jbenet
Copy link
Member

jbenet commented Jan 9, 2016

i do indeed recognize that there is value to try to get everything to be one path system, for example, give up on link properties and make this:

{
  "cat.jpg": {"link": "Qmcatjpg..."},
  "foo": {"link": "Qmfoo..."},
  "@attrs": {
    "cat.jpg": {
      "mode": 777,
      "owner": "jbenet"
    }
  }
}

://////

@mildred
Copy link
Contributor Author

mildred commented Jan 9, 2016

Your example works well because it represents a file structure that is well represented with unix-like paths. But what if we try to represent anything else. What if keys contain binary data ? You need an escape mechanism.

Or perhaps I'm not thinking strainght as i am tried. In any case, the path definition is not impacting (or should not impact) the IPLD format in itself. So perhaps that's an issue that can be resolved later?

@mildred
Copy link
Contributor Author

mildred commented Jan 9, 2016

I rephrased the paragraph you commented here to remove ambiguities. You talked about examples, but they are already there at the end of ipld.md.

If you want to unify every paths under the same prefix, how would that work in practice? For example, how will /ipns/ be managed (or am I lagging behind old concepts)?

@mildred
Copy link
Contributor Author

mildred commented Jan 9, 2016

{
  "cat.jpg": {"link": "Qmcatjpg..."},
  "foo": {"link": "Qmfoo..."},
  "@attrs": {
    "cat.jpg": {
      "mode": 777,
      "owner": "jbenet"
    }
  }
}

here, you won't be able to access both @attrs and cat.jpg using the same path mechanism, unless you start telling people they can't name a file @attrs.

I don't think you can force people to use the same path for everything. It's not practical. Some applications might need some things not provided by default.

@jbenet
Copy link
Member

jbenet commented Jan 22, 2016

@mildred I've made https:/ipfs/ipld-examples to try and resolve this question. So far i've only made two examples unixfs and post. take a look, i changed things up somewhat after giving more thought. I'm still not decided, but i'm seeing how horrible some of those options are.

could you please double check my work? im not sure i got everything right-- I may have messed up some of them, given that there are things you mentioned were problems but i didnt run head into them. (escaping . when using it as a delimiter, for example. may be that this could be a non issue, but anyway, its separate).

I think so far, my favorites are (8), (4), and (5). -- the other feel odd or are very, very confusing.

Also, see the unixfs pathing-- i made it a separate thing there too, and i am more convinced this is the right thing to do.

@drvirgilio
Copy link

I think . should not be used to do traversal. I think \. should. This way there is no need to escape for key names such as notes.txt which I think would occur more often.

@mildred
Copy link
Contributor Author

mildred commented Feb 11, 2016

I updated this branch to ipld-spec and added my understanding of the path mechanism (8)

A _merkle-path_ is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and then follows _named merkle-links_ in the intermediate objects. Following a name means looking into the object, finding the _name_ and resolving the associated _merkle-link_.
A merkle-path is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and allows access of elements of the referenced node and other nodes transitively.

_Merkle-paths_ aren't suited for using them in a general purpose filesystem because it introduces many restrictions on file names. However, it can be used to work on special purpose filesystems. It can be compared to the `/proc` filesystem on unix computers or HTTP Web APIs where the allowed paths is restricted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ditch this paragraph -- it's not totally accurate, FSes can be implemented, just the hoops get ugly (dirA/@link/dirB/@link/fileC/data), and we want something cleaner. I think the unixfs spec on top of IPLD can discuss the choices there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as long as I can store a file named @link on my unixfs filesystem (which would at one point in some future would become my root filesystem I'd imagine).


_Merkle-paths_ aren't suited for using them in a general purpose filesystem because it introduces many restrictions on file names. However, it can be used to work on special purpose filesystems. It can be compared to the `/proc` filesystem on unix computers or HTTP Web APIs where the allowed paths is restricted.

General purpose filesystems are encouraged to design an object model on top of IPLD that would be specialized for file manipulation and have specific path algorithms to query this model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can keep this, and it expresses enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@jbenet
Copy link
Member

jbenet commented Feb 12, 2016

@mildred things look good. I'm good to leave the cases there for now, and decide on the lenient vs strict down the road. we can implement strict for now and see how it goes?

@jbenet
Copy link
Member

jbenet commented Feb 12, 2016

I'm good to merge as is, and continue from there.

@jbenet
Copy link
Member

jbenet commented Feb 12, 2016

@mildred lmk if you are ready too, and i'll merge. else what else to do?

@mildred
Copy link
Contributor Author

mildred commented Feb 12, 2016

I'm ok with this as well.

jbenet added a commit that referenced this pull request Feb 12, 2016
@jbenet jbenet merged commit b1d4bd7 into ipfs:ipld-spec Feb 12, 2016
@jbenet jbenet removed the backlog label Feb 12, 2016
@mildred mildred mentioned this pull request Feb 12, 2016
@daviddias daviddias added the IPLD label Mar 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants