Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo should understand distinctions between public/private dependencies #2064

Closed
sfackler opened this issue Oct 20, 2015 · 51 comments
Closed
Labels
A-dependency-resolution Area: dependency resolution and the resolver C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`

Comments

@sfackler
Copy link
Member

sfackler commented Oct 20, 2015

Current report

Dependencies for crates today can be separated into two categories: those that are implementation details and those whose types are present in the public API of the crate. Cargo does not support distinguishing these two cases today, but it can often be crucial for dependency management.

If a dependency is an implementation detail (i.e. no types are exposed) then it can be easily duplicated in the dependency graph without worry (other than concerns like binary size). In other words, Cargo's safe to have multiple semver-incompatible versions of this dependency because they won't talk with one another.

If a dependency is part of the public API, however, then it's critical that any crate which can share this type it needs to all be of the same version. In other words, Cargo's logic of allowing multiple semver-incompatible versions is almost always guaranteed to go awry.

If Cargo were to understand a distinction between these private/public (or internal/exposed) dependencies then it could solve problems like:

  • Provide much nicer resolution error messages rather than relying on the compiler to figure this out.
  • Restructure the crate DAG to ensure that dependencies which work with multiple major versions end up working with other crates that only require one major version.
  • Ensure that crate resolution never picks too many versions of a crate and instead forces common dependencies to all have the same version.

Original report

Cargo is overeager to pull in multiple copies of a crate

Say we have a crate with the following Cargo.toml:

[package]
name = "foo"
version = "0.1.0"
authors = ["Steven Fackler <[email protected]>"]

[dependencies]
postgres = "0.9"
r2d2_postgres = "0.9"

r2d2_postgres version 0.9.3 depends on postgres version 0.10, while version 0.9.2 and lower depend on postgres version 0.9. Cargo would ideally pick r2d2_postgres version 0.9.2 and postgres version 0.9.6, but it instead picks r2d2_postgres version 0.9.3, which pulls in postgres version 0.10, as well as postgres version 0.9.6. This ends up totally blocking any use of these dependencies.

This issue rather significantly impedes crates' abilities to upgrade their dependencies.

It seems like Cargo's resolution logic should have the property that if there exists a resolved dependency graph with no more than one copy of each dependency, it should never pull in more than one copy of a dependency. Unfortunately, questions like "does such a resolved dependency graph exist" are NP-complete, but I suspect that it's tractable in practice (we don't expect crates to be randomly permuting their dependencies every minor version, for example).

In the meantime before someone has a chance to integrate a SAT solver into Cargo, I think there are some things we can do to help users correct Cargo when it fails to resolve things "properly". For example, cargo update could warn whenever it has to pull in multiple copies of a crate, identifying the portion of the object graph that forced that to happen. This should provide users an easier starting point to poking the graph into shape with cargo update --precise.

@sfackler
Copy link
Member Author

The SAT solver reference was not a joke btw - the yum package manager uses a SAT solver to do its job.

@alexcrichton
Copy link
Member

Yeah I've been worried about this in the past as well. Right now the resolver is kinda just a pile of heuristics and then it returns the first solution it always finds (despite there being many, in situations such as this).

It may be the case that we could avoid a full SAT solver by applying smarter heuristics for the time being (e.g. we'd need some sort of "scoring system" for any solution from a SAT solver anyway), and that may also keep the resolution stage speedy.

@alexcrichton
Copy link
Member

cc @wycats, curious on your thoughts about this

@sfackler
Copy link
Member Author

Yeah, the scoring system is what I'm most unsure about. It should prefer newer versions of libraries to older versions and try to minimize the number of copies of dependencies, but how those should interplay and factor into the solver seem like pretty nontrivial questions.

@SSheldon
Copy link

SSheldon commented Nov 5, 2015

I also noticed that, with a setup like:

[package]
name = "foo"

[dependencies]
bar = "*"
libc = ">= 0.1, < 0.3"
[package]
name = "bar"

[dependencies]
libc = "0.1"

cargo will pull in multiple copies of libc at version 0.1 and 0.2, even though the requirements can be satisfied with just 0.1. (Same issue happens with a wildcard dependency.)

Is that the same issue as here? It seems related but like a fix might be different.

@alexcrichton
Copy link
Member

Yeah I think it's basically the same issue as this.

@SSheldon
Copy link

SSheldon commented Nov 8, 2015

I think this issue really limits the usefulness of dependencies in crates and makes life hard for library authors. Particularly, it makes it dangerous to use any types in your public API that are defined by another library.

To use these types in your public API, users must add the other library as one of their dependencies. The hard part, however, is that if these dependencies are at different versions it won't compile.

For example, my objc crate works fine with either libc 0.1 or 0.2. I described the dependency as such in my Cargo.toml and assumed that being more widely compliant would be best so that the dependency doesn't conflict with other libraries that require a specific version. However, what's happened is that cargo always chooses to satisfy that dependency with 0.2, even when the user is using libc 0.1, resulting in a conflict and compiler error.

The result is that, currently, you're forced to choose a specific version for your crate and make the user use it, as well. If a dependency releases a new version, allowing use of it requires dropping support for the old version and publishing a breaking change, splitting your users and meaning the older versions stop receiving updates. This is frustrating for users, because they must ensure all their dependencies use the same version of a crate and they will have to update their dependencies more often as there are more frequent breaking changes.

Not to get melodramatic, but I think easing these multiple version issues will be important for rust and cargo to be pleasant to use in the future. I'm hoping this issue is on the radar of folks smarter than me and is being thought about, sorry if this turned into a bit of a rant.

@alexcrichton alexcrichton added the P-high Priority: High label Nov 9, 2015
@huonw
Copy link
Member

huonw commented Nov 10, 2015

@SSheldon I believe people should be able to force the use of specific versions with cargo update -p NAME --precise VERSION, fwiw. (Not as nice as changing cargo's behaviour, but it does mean things shouldn't be fundamentally breaking/require dropping support.)

@SSheldon
Copy link

@huonw yeah, it can be avoided if you're careful with your Cargo.lock. Though libraries are recommended to not commit that, so each new user would have to be required to do this.

Maybe another that'd help here would be a way to say in your Cargo.toml, for example, "I want to use the objc crate and I want it to use libc 0.1"? Currently I don't know of any way to do that besides overriding the Cargo file of your dependency.

blaenk added a commit to blaenk/sass-rs that referenced this issue Dec 20, 2015
sass-sys specifies a constraint of "0.1", but sass-rs specifies "0",
which allows version 0.2 and above

this shouldn't be a problem, because a "0.1" version would satisfy both
constraints, but I think the reason this isn't working is because of
rust-lang/cargo#2064 which might not get resolved anytime soon

without this fix, building sass-rs fails because it tries to use
libc 0.2 which has a different interface from the 0.1 that sass-sys used

this gets things building again for now, but we should look into
updating to the 0.2 interface when we get a chance
@graydon
Copy link

graydon commented Jan 3, 2016

I've been playing with hooking the Z3 SMT solver up to rust recently and realized this bug exists in cargo, so I sketched out a little semver-solver using it. I know it's a bit like breaking a walnut with a tank (Z3 is a sort of intense 14MB C++ library) but since Z3 (as of v4.4.1) also contains a numerical optimizer it's really trivial to express a mixed constraint-solving / optimization problem with it. It's just a bit of plumbing, maintaining a mapping from packages to Z3 terms, package-versions to integers, and expressing version constraints as integer inequalities on the terms.

Code's here: https:/graydon/z3-rs/blob/master/tests/semver_tests.rs#L158-L267
Crate's here: https://crates.io/crates/z3/

@sfackler
Copy link
Member Author

sfackler commented Jan 4, 2016

That's awesome!

@erickt
Copy link
Contributor

erickt commented Jan 7, 2016

SUSE's ZYpp uses libsolv. Yum might also switch to it in DNF. It's only 1.2MB of C, so it's smaller than Z3 :)

@wycats
Copy link
Contributor

wycats commented Jan 7, 2016

For what it's worth, the problem of supporting duplicates makes the exact constraints (and their weights) a little bit more complicated than the usual approach to this problem.

In most versions of this problem, dependency "conflicts" simply fail to resolve, but we have another option available to us. That said, over time, I've increasingly come to believe that we were too willing to use that solution in Cargo, and we'll probably ratchet down our usage of allowed-duplicates somewhat.

@graydon
Copy link

graydon commented Jan 7, 2016

Yeah, I think cargo casually permitting multiple copies of a dependency in a project is ... a dangerous default. It needs to be permitted (and we engineered the symbol mangling scheme to explicitly support it) but it's also relatively likely that any given instance of it is a bug. A potentially-very-subtle bug, like "two instances of a library that both think they are in charge of talking to OS subsystem / native library foo"

I think it should probably default to not-working, and require opting in. Though it might also break the ecosystem today if you set that to be the case. I'd be interested to run the experiment on the existing package set, see how many of them can't build with their existing dependency declarations, and if so how much work it'd take / how many duplicate-opt-ins or dependency-range-adjustments to fix.

@wycats
Copy link
Contributor

wycats commented Jan 7, 2016

@graydon I think it's a little more subtle than that.

"Shared" dependencies (dependencies whose types are used in other pub types, or which are transported to other crates through type inference) must not be duplicated.

"Internal" dependencies (dependencies whose types are only used internally) can be safely duplicated (but you might not want to for other reasons, like binary size).

We've talked some about how packages could say that a dependency was "internal", which would then permit duplication and error out if the type was actually made public. Even if we could perfectly detect this, it would still be important to explicitly opt in, because switching from an "internal" to "shared" dependency would be a breaking change to downstream crates.

@graydon
Copy link

graydon commented Jan 7, 2016

Hm. I think either we're talking past each other, or I disagree with your interpretation here, or something's changed I don't understand in the problem space. As far as I know (and as it was initially designed), if crate X exports type T, two copies of X (with the same or different versions, doesn't matter) can both be imported independently in the same composite crate w/o any linkage or type system problems. Their types should be given independent, disjoint identities in any composition of them.

The risk is only, as you say, in binary-size and (imo more seriously) violating uniqueness assumptions each copy of X might assume, such as "I'm the one that calls the init routine on a C library I'm linked against" or "I'm in charge of handling sigusr1" or whatever.

If I'm wrong about this, something has .. significantly changed in the design. What part do you think goes wrong?

@wycats
Copy link
Contributor

wycats commented Jan 7, 2016

Here's a simple example of the problem I'm talking about:

  • a time crate exports pub struct Time { ... }
  • a format_time crate depends on time and uses time::Time in its signatures
  • a fancy_time crate provides helpers for producing a time::Time

Now let's say another crate (datetime) depends on format_time and fancy_time. If it tries to pass a time::Time provided by fancy_time into format_time, they better be talking about the identical type.

It's possible for datetime to see two different time::Times and that part will work fine, but one of the primary goals of package naming and versioning (perhaps the primary goal) is to make it possible for humans to compose packages like this and have any hope of things working.

In this case, both fancy_time and format_time are treating the time dependency as a shared dependency, which is the most direct way I know of expressing this constraint ("they better both be talking about identical types").

@graydon
Copy link

graydon commented Jan 7, 2016

Ok, that's .. definitely an instance where you'd want non-duplication. And I can think of others! I mentioned some above. But it's not a hard requirement of any of the parts in the system that "public types" implies "can't duplicate", right? I'm just trying to get clear on what you're arguing. Last time I discussed this (in the context of linkage), I was assured that cargo passed a disambiguating sort of name/ver/source triple of metadata to rustc for each crate in a project, such that two copies of the same symbolic-named crate wouldn't clash at metadata-load or link time. I hope that's still true, even if I think it should not be the default behavior.

To be clear on what I'm arguing:

  • I think probably most users think shared resolution is already happening, consider duplication a bug
  • I think you could run the experiment on crates.io to see how badly things break defaulting to shared
  • If there's not much breakage, consider making it default
  • If there's a lot of breakage:
    • I suspect it's because of semver misuse as much as anything
    • I suspect downstream users will also be sympathetic to fixing semvers / opting in to duplication in cases they can't work around it
    • If it's too painful to set as a default, I think it should at very least become a noisy warning. Binary size alone makes it worth mentioning, and I think there are serious and subtle bugs that come along with it.

@sfackler
Copy link
Member Author

sfackler commented Jan 7, 2016

I was assured that cargo passed a disambiguating sort of name/ver/source triple of metadata to rustc for each crate in a project, such that two copies of the same symbolic-named crate wouldn't clash at metadata-load or link time.

Yep. Cargo will do that so there won't be any linker errors, but you will run into compile errors like "Expected type foo::Bar but got type foo::Bar" when the two versions of libfoo end up interacting.

@alexcrichton
Copy link
Member

@pornel I do believe so, yes, but it may be pretty non-trivial to implement (haven't thought too deeply)

@gwillen
Copy link

gwillen commented Aug 26, 2016

I'm just chiming in because I recently discovered that cargo will silently compile multiple versions of the same crate into the same binary, and this feels weird and scary to me, and I see that a warning was proposed but rejected.

I'm concerned that it doesn't seem like people are necessarily all on the same page with respect to @graydon's objection, which I (a total Rust outsider, I grant you) completely agree with -- it's not necessarily safe to link a crate twice EVEN if the linkage is private, and if it's not safe, the resulting bugs will be very subtle at best.

I like @graydon 's example of "two instances of a library that both think they are in charge of talking to OS subsystem / native library foo", which is disastrous EVEN if there's no type clash for the linker to detect because both copies are private. The best example I could come up with was a logging crate -- suppose that it by default opens a logfile whose name is, say, the name of the binary it's in, plus some other stuff (the string ".log", the current time, the pid, etc.) Then two private copies of the crate will open the same file, and the resulting behavior will be the worst kind of totally inexplicable, and probably someone will tear their hair out for weeks trying to debug it.

The pull request for the rejected warning mentioned that the sys crates are "highlanders" -- they enforce non-duplication -- and that servo has a tool for catching duplicate crates. This seems like evidence to me that some important cases require avoiding this currently-unavoidable behavior.

@jethrogb
Copy link
Contributor

jethrogb commented Nov 7, 2016

So there's been some breakage this weekend with the release of openssl-sys 0.9.0, due to the requirement that native libraries are linked by only one crate. Any package that depends on git2 ^0.4 no longer builds out of the box due to two versions of openssl-sys being pulled in. Can we get a SAT solver already pretty please? :)

@sfackler
Copy link
Member Author

Some interesting discussion of these issues: https://research.swtch.com/version-sat

@gwillen
Copy link

gwillen commented Dec 13, 2016

A question about the openssl-sys case -- do we actually need a SAT solver to fix it, or would ad-hoc adjustment to use the highest version of the dependency have been sufficient? (That is, did the package depending on the lower version specifically exclude the higher?) It sounds from @sfackler's link like other systems may perform this ad-hoc adjustment (and that it is in fact necessary to solve the non-NP-complete case correctly.)

@jimmycuadra
Copy link
Contributor

Any progress on this in 2017? It just came up again for one of Ruma's users who ended up with three different versions of a public dependency in their program and resulted in confusing error messages. Does this issue fit into any of the 2017 roadmap issues? Better error messages from the compiler, which make it clearer that there are conflicts between versions, could go a long way in helping new users figure out what the problem is before there is a more formal solution to public/private dep management with Cargo.

@Mark-Simulacrum
Copy link
Member

I believe rust-lang/rfcs#1977 is being relatively actively worked on.

@carols10cents carols10cents added A-dependency-resolution Area: dependency resolution and the resolver C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Sep 10, 2017
@carols10cents
Copy link
Member

rust-lang/rfcs#1977 has been accepted, and there is now a tracking issue: rust-lang/rust#44663 which IMO replaces this issue, so I'm going to close this, but note that it will be implemented!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dependency-resolution Area: dependency resolution and the resolver C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`
Projects
None yet
Development

No branches or pull requests