Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-322: Content Routing Hints #322

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions CONTENT_ROUTING_HINTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Introduction

Content routing maps a CID to one or more *providers*, which specify locations where the CID can be fetched.

In order to fetch content-addressed data, there must be *some* location addressing involved. With IPFS, the implicit default starting point is a set of bootstrap nodes. (And perhaps some LAN nodes discovered by mDNS, which has a starting point of the local subnet.)

So far, Kubo has planned to keep this “location addressing” implicit by adding new content routers to the default Kubo config (e.g. Filecoin indexers). But this only solves the problem for whatever specific records are provided by that indexer, and those set of implicit content routers have to be supported by the various implementations to maintain the facade of “pure content addressing”. There are also trust issues in terms of automatically sending user data to indexers that users have not explicitly trusted.

Instead of gateways and IPFS nodes implicitly sending all requests to a set of content routers that changes over time, and the community needing to reach consensus on what default routers to use, this proposes specifying that the default implicit content router is *only* the IFPS public DHT and LAN DHT, and all additional content routers must be opted-in by users when making API requests.

# Specification

The default implicit content router for IPFS nodes is the IPFS public DHT and LAN DHT. Any additional content routers must be opted-in by users when making API requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems independent of the rest of the specification around specifying routing hints. I suspect it's also the most controversial given prior resistance to even defining a single or collection of routing systems as the "default/standard IPFS content routing systems".

This happens to be how the most popular and oldest IPFS implementations (e.g. kubo) have been operating over the last several years, but it could reasonably change over time. Overall this seems to be related to an independent discussion on what should be "required" for an IPFS implementation and/or if/how we should label a collection of protocols and properties that some IPFS implementations have that will make systems easier to reason about (e.g. Bitswap 1.2.0, IPFS Public DHT, libp2p with some set of transports and upgraders, etc.).

I'd try and separate this long requested and likely quite useful issue from the more nebulous "what is IPFS" kinds of conversations. Although that discussion seems like a separate important one to have and document the outcome of so it can be referenced and/or modified in the future.


Users may opt-in to additional content routers using “content routing hints”, which give *suggestions* to the IPFS node about where provider records for the given CID may be found. This can include, but is not limited to, Reframe URLs, pubsub topics, multiaddrs, etc. As hints, the IPFS node is free to decide the order and strategy for using hints. If an IPFS node implements support for a hint that is specified below, it must follow the specification for that hint type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing for content routing hints seems fine, but IMO doing this comes with some items that may need addressing:

  • Systems that try and resolve IPFS content will likely need to be able to override user inputs to add/subtract content routers.
    • For example, I could see IPFS companion wanting to have a configuration for auto-appending content routing systems, so that a favorite system is always checked, or for limiting them to control which types of routing systems are permissible
  • Gateways might need to be able to return errors or status indicating what types of content routing systems can be used. e.g. if the gateway has blocked use of multiaddr hints but the user passed them and got some timeout error it should probably know that the multiaddr hint wasn't allowed to be used.
  • More documentation and modification to tools like https://check.ipfs.network/ to help people understand the implications of using new routing systems and how different systems interoperate.


When a node receives a request with content routing hints, it should search for provider records in the IPFS public DHT and at locations specified in the hints.

## Hint Types

Implementations are free to support hint types that make sense for their use cases.

### URI

- **Reframe**
- HTTPS URL that ends with `/reframe` MUST be interpreted as a Reframe hint, for example:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we specify different codecs like dag-cbor? Do we need to add this to the Reframe spec for HTTP URLs? And what about non-HTTP transports? Would that need to be something we add to multiaddr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we specify different codecs like dag-cbor?

That is up to HTTP client (who controls Accept header sent with the request).

I think details of /reframe over HTTP are specified in reframe/REFRAME_HTTP_TRANSPORT.md.
It already requires endpoint to be named /reframe, but we could add a paragraph to that spec which states is more explicitly.

what about non-HTTP transports? Would that need to be something we add to multiaddr?

We probably want to register /reframe protocol at https:/multiformats/multiaddr/blob/master/protocols.csv, allowing for multiaddrs like:

/dns4/example.net/tcp/443/tls/http/reframe
/dns4/example.net/tcp/443/wss/reframe

(and looping in libp2p team for sanity check, if this is correct way of representing this)

- [`https://cid.contact/reframe`](https://cid.contact/reframe)
- [`https://routing.delegate.ipfs.io/reframe`](https://routing.delegate.ipfs.io/reframe)
- **Magnet links (TBD, for consideration)**
- “De facto standard” outside IPFS: [https://en.wikipedia.org/wiki/Magnet_URI_scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme)
- **HTTP mirror AKA Web seed (TBD, for consideration)**
- We could speed up data transfer of leaf nodes by making HTTP range-requests to the provided HTTP URL.
- Bit out there, but could provide additional flexibility, especially when a URL of a public gateway is used.

### Multiaddr (instant win)

Multiaddrs alone provide a very flexible solution for routing hints. They enable control over the number of additional lookups that a client needs to make to reach the data:

- `/ip4/A.B.C.D/tcp/NNN/p2p/{peerID}`
- Removes need for any lookups, can try to connect directly and start data transfer
- `/p2p/{PeerID}`
- Saves 1 DHT lookup - we already know potential provider’s PeerID, only need to find their addresssed via `findpeer` (or similar)
- On gateways this is highly cacheable
- `/dnsaddr/{domain}`
- Requires resolving [DNSAddr TXT records](https:/multiformats/multiaddr/blob/master/protocols/DNSADDR.md) on DNS, but allows big content storage services to scale / load-balance with ease, leverage DNS-based delegation to nodes that have data and are the closest
- Could include fully resolved addresses, PeerIDs, or another DNSAddrs (with some sane recursion limit, could be the same as for resolving /ipns/ paths – 32)
- Allows us to collapse a lot of complexity into a single DNS-based hint
- 💡IDEA: we could implicitly check for DNSaddr on domains that have DNSLink
- Opening `[https://dweb.link/ipns/en.wikipedia-on-ipfs.org](https://dweb.link/ipns/en.wikipedia-on-ipfs.org)` could make gateway implicitly check for DNSAddr for the domain at `_dnsaddr.en.wikipedia-on-ipfs.org` , that could have TXT records pointing at storage providers that have website data (TXT record `dnsaddr=/dnsaddr/storage-provider1.com`)
- 💡IDEA: Since we have a valid DNS name, we could also check if `{domain}` exposes Reframe endpoint at `/reframe`
- This would create a pretty elegant convention where URL hint is short (`/dnsaddr/service.com`, and at the same time allows for multiple types of routing hints to be passed this way.

### PubSub Router Topic (future, TBD)

- This one is for the future, needs additional design analysis, but we already have PoC for [IPNS over PubSub](https:/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipns-pubsub) and a “Generic” router is implemented in https:/libp2p/go-libp2p-pubsub-router
- We could come up with an implicit or explicit protocol for joining a specific pubsub topic for requested content.
- The implicit topic name could be based on the root CID of the requested path (allowing peers browsing the same DAG to participate in the same topic)
- This could happen even without `?providers=` being present, but needs analysis how feasible it is to do this by default.
- Even if this type of router is disabled by default, we could leverage the fact that `?providers=/dnsaddr/{domain}` is passed and create one.
- Nodes could join a topic based on the DNS name from DNSaddr, allowing peers interested in the content from the same provider to exchange data directly over PubSub, skipping DHT or centralized Reframe endpoint.
- A variant of this that is especially powerful. s when browsing DNSLink website or IPNS name. Mutable pointer would ensure people having old and new version of

## Gateway Requests

We would add support for an optional `?providers=` URL parameter ([percent-encoded](https://en.wikipedia.org/wiki/Percent-encoding), comma-separated) or HTTP header `X-Ipfs-Providers` sent with HTTP request to a gateway.

- `/ipfs/{cid}?providers=url,multiaddr,somethingelse?`
- Example: `https://dweb.link/ipfs/bafy..acbd?providers=/dnsaddr/storage-provider1.com`
Comment on lines +67 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In designing both delegated routing and the content routing setup, I thought the goal was to try to maintain the property we have today of content addressed data.

if we have to specify the origin for the data, we've lost some of this property.

I guess there's a benefit of discovery of content routers through this mechanism, but I would hope that's not the only way we learn about content routers in kubo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have to specify the origin for the data, we've lost some of this property.

We share the concern, and that is why this proposal.

We will lose it for sure if a public gateway decides to use a specific set of indexers, and to not use others.
A user may know that their data is in place X, but will be forced to use a specific indexer Y because that is the only thing the gateway speaks.

would hope that's not the only way we learn about content routers in kubo.

This proposal is not replacing Routing.Routers, but compliments it. Gateways should not be gatekeepers. We want to see multiple indexers, and each indexer should not have to lobby for being included on some default lists to provide utility to users.

By giving users the ability to pass additional routing hints, we remove the surface for undesired storage and content routing lock-in caused by the tyranny of the default.

Gateway operators will be in control to set any custom peering and routing they wish, but the users should still be able to improve routing even further by providing an optional hint that the gateway/node can leverage for finding providers when the CID can't be found using conventional methods. This will be course-correcting any routing gaps that may occur.

(Note to self to incorporate this into the spec)

- `X-Ipfs-Providers: url,multiaddr,somethingelse`
- Example: `X-Ipfs-Providers: /dnsaddr/storage-provider1.com`
- Gateways will be free to leverage this hint to speed up content routing, or ignore it.
- Allows public gateways to load content from services that do not announce CIDs on DHT (e.g., Pinata).

## API Requests

We would add optional `--providers` parameter, that allows for passing as-hoc hints that are scoped to specific command.
Comment on lines +74 to +76
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to the kubo HTTP API, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will move this to "Notes for implementers" section as an example of CLI API.


### Prior art

- [https://en.wikipedia.org/wiki/Magnet_URI_scheme](https://en.wikipedia.org/wiki/Magnet_URI_scheme)
- DHT hash + optional list of HTTP URLs with trackers (~indexer’s reframe endpoints)
- [routing.delegate.ipfs.io/reframe](http://routing.delegate.ipfs.io/reframe)
- Example:

```jsx
magnet:?xt=urn:ipfs:[IPFS_CID]
&dn=file_name.mp4
&x.ref=[REFRAME_URL_1]
&x.ref=[REFRAME_URL_2]
```


- In IPFS ecosystem
- [Content routing hint via DNS records #6516](https:/ipfs/kubo/issues/6516)
- [Content routing hint via HTTP headers #6515](https:/ipfs/kubo/issues/6515)
- [https://discuss.ipfs.tech/t/proposal-peer-hint-uri-scheme/4649/21?u=lidel](https://discuss.ipfs.tech/t/proposal-peer-hint-uri-scheme/4649/21?u=lidel)