Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get some info about UnixFS objects on public IPFS HTTP API #8528

Open
3 tasks done
d70-t opened this issue Oct 27, 2021 · 2 comments
Open
3 tasks done

Get some info about UnixFS objects on public IPFS HTTP API #8528

d70-t opened this issue Oct 27, 2021 · 2 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Topic gateway

Comments

@d70-t
Copy link

d70-t commented Oct 27, 2021

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

I am implementing a backend to access IPFS via the Python library fsspec at ipfsspec. To do so (and to save me from implementing the IPFS protocol in Python), the plan is to access UnixFS files and directories on IPFS via a HTTP gateway. An fsspec backend needs to implement a function info(path) which must return

  • if the thing behind a path is a directory or a file
  • if it is a file, the size of the file

To me, this seems to be a reasonable requirement for other generic filesystem abstractions as well, thus I assume that this feature request could be of broader interest.

While the /v0/files/stat provides this kind of information, this endpoint is often not reachable on public gateways.

Another option to obtain this information is to perform a HEAD request towards http://gateway/ipfs/CID, which in case of a file provides the size in the content-length header and which (seemingly) lets me discriminate between file and directory using the etag header. This method works on some public gateways, but scares me as well, as this doesn't seem to be the right use of observable API features.

I see three possible ways to obtain the desired functionality:

  • It is already implemented and I didn't find it?
  • Move / replicate the files/stat API to public gateway port (could this be GET as well?)
  • Implement and document HTTP headers which include this information and are to be returned when /ipfs/CID is requested

Tagging @whyrusleeping as I've been talking to him about this already on slack.

@d70-t d70-t added the kind/enhancement A net-new feature or improvement to an existing feature label Oct 27, 2021
@lidel
Copy link
Member

lidel commented Oct 29, 2021

I understand you want to build something future-proof, and robust.

The long term direction is that we will be removing /api/v0 (subset of go-ipfs' RPC over HTTP, never designed to be exposed on the web) from public gateways and enhancing content paths at /ipfs/{cid} with necessary APIs.

Detecting a directory today (go-ipfs 0.10)

If you want to implement something against how go-ipfs gateways are today, your best option to detect a directory is sending HTTP HEAD. IF content-type is text/html AND Etag starts with DirIndex- then it is a directory listing. While it feels awkward, it is a robust and future-proof check: directory listings will always be returned as HTML by default, and response requires this custom Etag for cache control to avoid potentially mutable HTML being cached forever like we do with immutable files under /ipfs/.

Future

In the future, in addition to the Etag way, we most likely will have /ipfs/{cid}?format=dag-json which will return the dag-pb root block serialized into a deterministic JSON format that could be cached forever, and/or /ipfs/{cid}?format=unixfs-stats parameter which will have Type (dir/file).

We are already tracking ?format= in #8234, but let's keep this one open to ensure it includes the ability to get unixfs directories in more efficient manner.

Feature scope

MVP is to make it possible to send request to /ipfs/{cid}[?format] where CID is dag-pb (unixfs) and get:

  • deterministic dir listing as JSON that can be cached forever (cache-control: public, max-age=29030400, immutable)
  • type (file/directory)
  • size (data, data+envelopes)
  • links (dir, big file)

@lidel lidel added the topic/gateway Topic gateway label Oct 29, 2021
@guseggert guseggert added the P2 Medium: Good to have, but can wait until someone steps up label Aug 5, 2022
@lidel
Copy link
Member

lidel commented Aug 5, 2022

Related proposal: add Ipfs-DagSize and Ipfs-DataSize to gateway responses.
If someone needs this, please raise support in the linked issue, or propose IPIP against ipfs/specs repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Topic gateway
Projects
None yet
Development

No branches or pull requests

3 participants