Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: magic-cookie-discovery: a zero-config, DHT-based discovery mechanism for libp2p #551

Open
thomaseizinger opened this issue Jun 4, 2023 · 17 comments

Comments

@thomaseizinger
Copy link
Contributor

The following is the result of an idea I've iterated on with @mxinden and @dhuseby after IPFSthing. I am not aware of an immediate user that is asking for this but I think it might have the potential to be a "wow"-effect for libp2p for new users and I wanted to gather some feedback from the wider community. Coming from a user-perspective myself and only now being a maintainer, I would have wanted this to exist 3 years ago.

This idea is very much early stage and I am curious what other people think.

Description

Our DHT spec allows for a concept of "provider records". They are basically sets where nodes can add and remove themselves to/from a certain key. The gist of the idea is that we specify a function that deterministically computes an identifier from a node's supported protocols and we automatically add ourselves as a "provider" for this identifier. Thus, a node that supports the same set of protocols (or a subset based on configuration) can connect to an existing libp2p-DHT network (like IPFS) and find nodes that support the same set simply by looking up providers for its "identifier".

Example:

  • Assume a libp2p network with a DHT
  • Node A boots and only knows about a bootstrap node of this network (like an IPFS node)
  • Node A looks at its protocols and deterministically computes an identifier X from it
  • Node A adds itself as a provider for X

Note that being a "provider for X" doesn't have any further meaning beyond supporting the protocols being used to compute X. It doesn't mean we support BitSwap or any other protocol to actually fetch a value for X because there is no value for X.

  • Node B with the same set of protocols boots and connects to the same libp2p DHT
  • Node B computes the same identifier and lists all providers for X
  • Node B discovers node A through this mechanism

Details

A few details on how we could implement this:

  • Allow nodes to filter, which protocols contribute to the identifier, like filtering out irrelevant ones (i.e. filter out the actual DHT protocols, identify, ping etc)
  • Sort remaining protocols and hash them
  • Use a TOTP-based approach where the final identifier is time-based, meaning who stores the providers in the network moves around with time, rotating who in the network has to put up with the "burden" of storing these provider records
  • Make sure these nodes are still good citizens of the DHT-network and not parasites

Alternatives

We could let the user decide the identifier by which they want to discover their network. This is easily possible but kind of takes away the appeal of the idea a bit. Having literally zero-config approach (default to IPFS network) for discovering other libp2p nodes that speak the same set of protocols is quite appealing IMO.

@marten-seemann
Copy link
Contributor

Not sure if this is the right place to post the proposal, the IPFS stewards will probably have opinions about this.

One problem that immediately comes to mind is that the nodes close to some really popular protocol (one that is supported by the majority of the network) will now have to store O(N) "provider" records.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 5, 2023

Not sure if this is the right place to post the proposal, the IPFS stewards will probably have opinions about this.

Yes, can you tag someone? I think it'd be good to get their expertise. Note that from an abstraction PoV, it doesn't really concern IPFS. We only depend on features of the libp2p DHT protocol so this should work equally well on other DHTs like polkadot.

Consequently, I think libp2p/specs IS the right place ti discuss / introduce this.

One problem that immediately comes to mind is that the nodes close to some really popular protocol (one that is supported by the majority of the network) will now have to store O(N) "provider" records.

Note that it would be about popular protocol combinations, not individual protocols.

The time-based aspect would ensure that this record will move around the network and thus not stay with one node. I think aligning this with the re-provide interval would make sense.

Also note that IPFS already has this "problem" for popular files like the IPFS project README or certain demo files / videos. I am not aware that this is a problem today, but if it becomes one, it would make sense to solve separately and not block proposals like this one on it.

Unless I am missing something, this isn't a "misuse" but just another application of the provider records feature. To be fully compliant (please correct if wrong), we can also specify to serve some kind of (useful?) content at the given key. Perhaps the list of protocols themselves?

@guillaumemichel
Copy link
Contributor

@thomaseizinger I think this proposal could be of great use to discover unpopular features in the Composable DHT, solving @dhuseby isolation threshold problem.

Note that it would be about popular protocol combinations

However, I don't think that it would be wise to advertise popular protocols in the current DHT. By definition a popular protocols are easy to find, you can just perform a random walk (lookup random kademlia keys), and you will easily stumble upon a peer running the protocol you are looking for.

As you mentioned, the IPFS DHT (Banana?) has a problem with storing provider records for popular content. Some nodes end up storing a provider record for 10K-100K's of peers, which is a huge overhead. We need to tackle this issue (e.g with load balancing), but it isn't our priority at the moment. Adding this extra load to the IPFS DHT seems to be a significant overhead, especially if the main goal is to have a "wow"-effect for libp2p.

I think that it is a great design, that could be useful to discover unpopular features (limiting the size of the provider records), or as part of the Composable DHT.

Use a TOTP-based approach where the final identifier is time-based, meaning who stores the providers in the network moves around with time, rotating who in the network has to put up with the "burden" of storing these provider records

This means that it wouldn't always be the same peers paying the cost of storing all these provider records, which is good. However, it could DOS a node with a modest hardware for the time, it has to store the record. It could have an effect of peers rotating their peerids to avoid storing these records. Most of the costs wouldn't come from disk usage, nor serving the provider records (unless it becomes a very popular feature), but rather to handle all the PROVIDE requests (e.g in the IPFS network, all of the IPFS nodes would send you a provide request, as they all run the same set of protocols, within the reprovide interval of 22h).

A client looking up a set of protocol would have to lookup 2 kademlia keys to be sure to find all providers, the one derived from the current TOTP epoch, and the previous TOTP epoch. But it is only relevant for unpopular sets of protocols.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 5, 2023

Note that it would be about popular protocol combinations

However, I don't think that it would be wise to advertise popular protocols in the current DHT. By definition a popular protocols are easy to find, you can just perform a random walk (lookup random kademlia keys), and you will easily stumble upon a peer running the protocol you are looking for.

You are misunderstanding me. I was saying that the problem that @marten-seemann was pointing out is only an issue for popular protocol combinations. The thing I was emphasizing is the combinations bit. This isn't a proposal to answer questions like: "Which nodes run the ping protocol?" or "Which nodes run my custom chat protocol?".

This is a proposal for finding nodes that support the exact same set of protocols whilst being a guest in a DHT of a different application. In general, I see this feature primarily as a way for new networks to bootstrap themselves. If you already have bootnodes (like IPFS does), there is no point in advertising yourself via such a "magic cookie".

Instead, the main beneficiaries of this protocol are applications that don't (yet) have their own boot nodes. They would use the IPFS or any other libp2p DHT to bootstrap themselves in exchange for helping with peer routing in that network. This is important. If somebody decides to build this, I think it is crucial that these nodes are well-behaved guests in that DHT and help with content and peer routing. Question for the IPFS folks: How useful is a node in the IPFS DHT that does not support bitswap?

Some nodes end up storing a provider record for 10K-100K's of peers, which is a huge overhead. We need to tackle this issue (e.g with load balancing), but it isn't our priority at the moment. Adding this extra load to the IPFS DHT seems to be a significant overhead, especially if the main goal is to have a "wow"-effect for libp2p.

Why would it be significant overhead? I'd expect the number of provider records for a new network to be in the order of 10-100 nodes. It is also easy to imagine that - before adding oneself as a provider - a node looks up all providers and if it finds > 100, it doesn't add itself but connects to the existing nodes. Once this initial bootstrapping is done, the nodes can form their own DHT network and thus discover more of their own nodes.

Such a limit can be applied as a band-aid until the general load-balancing problem is solved.

@guillaumemichel
Copy link
Contributor

Question for the IPFS folks: How useful is a node in the IPFS DHT that does not support bitswap?

It doesn't matter whether the node supports Bitswap or not for the IPFS DHT. The requirements are simply to support peer (peer records) and content routing (provider records + IPNS). Nodes advertising their protocols wouldn't even need to be DHT servers, they could simply "provide" without participating in any routing.

I'd expect the number of provider records for a new network to be in the order of 10-100 nodes. It is also easy to imagine that - before adding oneself as a provider - a node looks up all providers and if it finds > 100, it doesn't add itself but connects to the existing nodes. Once this initial bootstrapping is done, the nodes can form their own DHT network and thus discover more of their own nodes.

That sounds pretty reasonable to me, I don't see any problem with this. I just don't want to see one big provider record referencing all IPFS nodes 😄

IMO this proposal would be a good addition 👍🏻

@thomaseizinger
Copy link
Contributor Author

Question for the IPFS folks: How useful is a node in the IPFS DHT that does not support bitswap?

It doesn't matter whether the node supports Bitswap or not for the IPFS DHT. The requirements are simply to support peer (peer records) and content routing (provider records + IPNS).

Great! To paraphrase: This means a node utilizing this proposal is just another node in the DHT and actually adds value.

Nodes advertising their protocols wouldn't even need to be DHT servers, they could simply "provide" without participating in any routing.

They could but that would make them parasites, wouldn't it? I think it would be good if those nodes would give back to the DHT for its service to allow for bootstrapping.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 5, 2023

I'd expect the number of provider records for a new network to be in the order of 10-100 nodes. It is also easy to imagine that - before adding oneself as a provider - a node looks up all providers and if it finds > 100, it doesn't add itself but connects to the existing nodes. Once this initial bootstrapping is done, the nodes can form their own DHT network and thus discover more of their own nodes.

That sounds pretty reasonable to me, I don't see any problem with this. I just don't want to see one big provider record referencing all IPFS nodes smile

One could even argue that it might be a good recommendation in general for libp2p DHT nodes to refuse adding themselves to a provider record if it is greater than a specified size.

@marten-seemann
Copy link
Contributor

marten-seemann commented Jun 5, 2023

Thanks to @guillaumemichel for elaborating on the problem I described.

I don’t think that the distinction between „protocol“ and „protocol combination“ is helpful here. To be useful, the list of protocols needs to be well-defined and predictable by other nodes (e.g. [my proto 1, my proto 2]), so you can’t just mix random protocols into that list (which would spread the load). Otherwise other nodes won’t know which key to look up to find you.

I can see how this would be useful for bootstrapping a new network, but you’re inherently building a solution that doesn’t scale, and, worst case, brings down the very network you’re relying on once your application becomes popular.

This proposal sounds like a useful mechanism for protocols that you know will remain a fringe use case.
It’s not novel by the way, we did this for circuit v1, because we knew that running a circuit v1 node is so expensive (and doesn’t bring you any benefits) that only a few operators would ever do that. In practice, it was mostly PL-run nodes. There was never a risk that the majority of IPFS nodes would suddenly become circuit v1 nodes and bring down the network. Incidentally, this is one of the reasons why circuit v2 doesn’t advertise to the DHT.

I know this will be an unpopular opinion in certain circles, but if all you’re trying to solve the bootstrapping problem, you might be better off running a bootstrap server. Yes, you’ll have to run one (or a few) servers, but you’ll get 1. vastly improved scalability 2. better latency (probably 10x if you make use of edge networks) and 3. greatly improved censorship resistance.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 5, 2023

I don’t think that the distinction between „protocol“ and „protocol combination“ is helpful here. To be useful, the list of protocols needs to be well-defined and predictable by other nodes (e.g. [my proto 1, my proto 2]), so you can’t just mix random protocols into that list (which would spread the load). Otherwise other nodes won’t know which key to look up to find you.

If you are building an application, your set of protocols is well-defined. Likely, you want to filter protocols like ping from this but that is an unimportant implementation decision in my opinion.

I can see how this would be useful for bootstrapping a new network, but you’re inherently building a solution that doesn’t scale, and, worst case, brings down the very network you’re relying on once your application becomes popular.

I am curious why you think that it doesn't scale? If this doesn't scale, doesn't this mean that by extension IPFS doesn't scale?

As described above, we can easily limit the size of the provider records. I do think that hardcoding such a limit is a hack but really, it is a shortcoming of our DHT protocol and has nothing to do with this idea.

This proposal sounds like a useful mechanism for protocols that you know will remain a fringe use case.

Why do they have to remain a fringe usecase? It seems to me that we are circling around one criticism point: Large provider records. That is a problem of the current DHT design, not this proposal so it should be solved on that level. In the meantime, we can limit the impact by manually keeping the size small.

It’s not novel by the way, we did this for circuit v1, because we knew that running a circuit v1 node is so expensive (and doesn’t bring you any benefits) that only a few operators would ever do that. In practice, it was mostly PL-run nodes. There was never a risk that the majority of IPFS nodes would suddenly become circuit v1 nodes and bring down the network. Incidentally, this is one of the reasons why circuit v2 doesn’t advertise to the DHT.

Good to know that there is prior art!

I know this will be an unpopular opinion in certain circles, but if all you’re trying to solve the bootstrapping problem, you might be better off running a bootstrap server. Yes, you’ll have to run one (or a few) servers, but you’ll get 1. vastly improved scalability 2. better latency (probably 10x if you make use of edge networks) and 3. greatly improved censorship resistance.

Ironically, I think the opposite is true for censorship resistance. With this proposal, any IPFS node can be used for discovery and they can be oblivious about this proposal. Unless I am missing something, this proposal just uses existing functionality and it should be indistinguishable from other traffic.

Compared to that, the composable DHT is much easier to censor because nodes can just drop "features" they don't want to support.

@guillaumemichel
Copy link
Contributor

It seems to me that we are circling around one criticism point: Large provider records. That is a problem of the current DHT design, not this proposal so it should be solved on that level. In the meantime, we can limit the impact by manually keeping the size small.

It is a weakness that the IPFS DHT currently has, and we cannot fix it for now. Using large providers records sounds like exploiting this vulnerability that we are aware of. Hence I would like to avoid adding more large provider records if possible.

Compared to that, the composable DHT is much easier to censor because nodes can just drop "features" they don't want to support.

Nodes storing a provider record associated with a (set of) protocol could simply decide to drop it in order to censor it. Moreover, this feature could be implemented on the Composable DHT.

@thomaseizinger
Copy link
Contributor Author

It seems to me that we are circling around one criticism point: Large provider records. That is a problem of the current DHT design, not this proposal so it should be solved on that level. In the meantime, we can limit the impact by manually keeping the size small.

It is a weakness that the IPFS DHT currently has, and we cannot fix it for now. Using large providers records sounds like exploiting this vulnerability that we are aware of. Hence I would like to avoid adding more large provider records if possible.

100% agree. Do we agree that a behaviour of "check size of provider record before adding yourself" would be a sufficient fix? This can be relaxed once the problem is fixed in the DHT design, if that ever happens.

Compared to that, the composable DHT is much easier to censor because nodes can just drop "features" they don't want to support.

Nodes storing a provider record associated with a (set of) protocol could simply decide to drop it in order to censor it. Moreover, this feature could be implemented on the Composable DHT.

How do you know that the record is associated with a set of protocols? Hash functions are not reversible, meaning you'd have to have a blocklist of specific record keys that you don't want to forward or attempt to compute several hashes based on a list of protocols. On top of that, the time-based component means that the record keys will not be static. Additionally, it is always a set of protocols that gets hashed into such a "magic cookie", meaning you also need to get the order and exact composition right.

If we add an additional, user-configurable salt (per application) into it, this becomes an even bigger burden.

Finally, you have to roll out this kind of blocking to every IPFS node out there. It is not sufficient to just update one or a few of them.

I am not saying it is impossible to censor but it is also far from trivial, esp. because the entire IPFS network needs to agree that a certain key should be censored.

@marten-seemann
Copy link
Contributor

I don't understand the motivation behind this proposal. It's not clear to me what problem it's trying to solve, and feels more like a solution in search of a problem.

I also don't really understand what the intended outcome of this proposal is. This already has been implemented and used in production (circuit v1, see above), so clearly anybody who wishes to do so, can do so already (and could a few years ago). Is the proposal to add a dedicated convenience wrapper API for this to existing libp2p implementations?


It seems to me that we are circling around one criticism point: Large provider records. That is a problem of the current DHT design, not this proposal so it should be solved on that level. In the meantime, we can limit the impact by manually keeping the size small.

It is a weakness that the IPFS DHT currently has, and we cannot fix it for now. Using large providers records sounds like exploiting this vulnerability that we are aware of. Hence I would like to avoid adding more large provider records if possible.

100% agree. Do we agree that a behaviour of "check size of provider record before adding yourself" would be a sufficient fix? This can be relaxed once the problem is fixed in the DHT design, if that ever happens.

That's a fairly big change (since you now have to check what the other providers are), that comes with its own set of security problems. There might also be better solutions, but I think ProbeLab and the IPFS stewards have more expertise here. If I remember correctly, I once came across a proposal to store your record with nodes that are further away from the target, which would have some interesting (security, performance, replication, load balancing) properties.

If we add an additional, user-configurable salt (per application) into it, this becomes an even bigger burden.

Finally, you have to roll out this kind of blocking to every IPFS node out there. It is not sufficient to just update one or a few of them.

I am not saying it is impossible to censor but it is also far from trivial, esp. because the entire IPFS network needs to agree that a certain key should be censored.

This again comes down to the motivation for this proposal being unclear: it's also not clear what the thread model is. I never claimed that the risk is that all nodes on the DHT conspire to censor your record (of course that's not going to happen!).
However, it is rather easy to eclipse a certain key in the DHT. A salt doesn't help here at all, neither does the TOTP. On the other hand, try censoring a HTTPS-enabled bootstrap server behind an ECH-enabled CDN...

@thomaseizinger
Copy link
Contributor Author

I don't understand the motivation behind this proposal. It's not clear to me what problem it's trying to solve, and feels more like a solution in search of a problem.

I also don't really understand what the intended outcome of this proposal is. This already has been implemented and used in production (circuit v1, see above), so clearly anybody who wishes to do so, can do so already (and could a few years ago). Is the proposal to add a dedicated convenience wrapper API for this to existing libp2p implementations?

The intent of this proposal is to provide visibility for this solution and perhaps provide a standard-way for doing this. I've been part of the libp2p project in one form or another for the last 5 years, never came across this idea anywhere but it would have been exactly what I needed in the past. One could argue that "users should just read and understand all specs" and come to the conclusion themselves that this is possible. It is just unnecessarily hard and leads to people re-inventing this. Also, if people build this themselves, they might not consider aspects such as large provider records.

Thanks for the feedback to both of you at this point.

I'll pause this here and give it some more thought. I think there is value in documenting this in some form. If not as a spec, perhaps this is just a "pattern" on what you can do with IPFS / libp2p? After all, the key doesn't have to be computed from your list of protocols but could be chosen by the user. Computing it deterministically just gives it a nice zero-config aspect.

@DougAnderson444
Copy link
Contributor

Some community input here: As I dive deeper into libp2p, I've been looking for a zero-config, DHT-based discovery mechanism for libp2p peer networks as well.

Say I start up a little network for sharing, say, web3 identities in a name-name-system kind of way. Maybe I want it to be isolated so my peers aren't bombarded with pubsub traffic I would never be interested in. I want to be able to have a network identity for my protocol codecs where we can just say "find us on libp2p" so I don't need to setup, maintain, and advertise bootstrap server of our own or having to pass out-of-band connection details.

@dhuseby mentioned at a recent community call about "Last Known Whereabouts" which could translate to a magic-cookie of a list of recent peers on the network. I envision only having to run a small service on ipfs (or polkadot, or wherever) to point any requests to the latest LKW list of peers. Once a peer has an initial LKW list cached, if there are >200 in my network, odds are they can directly bootstrap from their LKW list and not need to check the main network at all.

But none of this is built as far as I can find.

I've read through the discussion, and it's still not clear to me how we would do this with the current toolchain. If it's possible for those "in the know" a simple demo implementation would go a long way.

If we want libp2p to be the networking base layer of choice outside IPFS, Polkadot and ETH2, making discovery an easy frictionless process would go a long way to doing so.

@thomaseizinger
Copy link
Contributor Author

I've read through the discussion, and it's still not clear to me how we would do this with the current toolchain.

The essence is (referencing rust-libp2p APIs, other implementations might differ):

  1. Define some unique key that is shared between all your implementations. Can be predefined or computed from other information like your supported protocols.
  2. Connect to the existing DHT network
  3. If your node should be found by others, call start_providing with the computed value: https:/libp2p/rust-libp2p/blob/master/protocols/kad/src/behaviour.rs#L911
  4. To find other nodes, call get_providers with the same key: https:/libp2p/rust-libp2p/blob/master/protocols/kad/src/behaviour.rs#L948
  5. Wait for the query results to come in: https:/libp2p/rust-libp2p/blob/68af6669abccfbf3f6a27868184d7cc20f684409/protocols/kad/src/behaviour.rs#L2704C1-L2705

@DougAnderson444
Copy link
Contributor

The essence is (referencing rust-libp2p APIs, other implementations might differ):

This is excellent essence, thank you for summarizing 🙏

  1. If your node should be found by others, call start_providing with the computed value: https:/libp2p/rust-libp2p/blob/master/protocols/kad/src/behaviour.rs#L911

I suppose this ^^^ is the crux of this spec -- what does that "computed value" look like, and agreed up codec so that we all know what we're looking for when in discovery mode.

This may tie into my concept of libp2p plugins where as long as you are all using the same plugin the codec is transparent to the users/implementers, and could even be some random bytes or a hash. I'll give this some thought and put together some experiments to see if it makes sense

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 5, 2023

3. If your node should be found by others, call start_providing with the computed value: libp2p/rust-libp2p@master/protocols/kad/src/behaviour.rs#L911

I suppose this ^^^ is the crux of this spec -- what does that "computed value" look like, and agreed up codec so that we all know what we're looking for when in discovery mode.

It is actually not as critical as it may seem like initially. Suppose I write an application: Thomas' p2p chat. As long as my software uses the same key for providing and lookup, I will be able to discover my nodes. Doug's p2p chat can use a different strategy for coming up with this key and it will still work. Because our two applications are unlikely to be compatible anyway, we don't need to be able to find each other via the DHT.

Computing the identifier from the list of protocols is nice because libp2p already has the assumption that protocol names MUST be unique. If we use the same protocol name for two semantically different protocols, then bad things will happen already if we connect. Thus, we might as well build on top of that and discover all nodes that speak the same protocol suite.

I think the main benefits of standardizing something here are:

  • Visibility: Make users aware of the feature.
  • Avoiding footguns like the above discussed "large provider records".
  • Ease of use: If it is standardized / specified, we can implement it as a module in the various languages and people can use just it.

Interoperability between different implementations is also nice but actually not that important because new applications (Thomas' p2p chat etc) are by design not compatible with the nodes (i.e. IPFS) in the DHT, otherwise it wouldn't be a new application but just another node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

4 participants