Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new dependency resolver runs "forever" due to incompatible package versions #8922

Closed
bkurtz opened this issue Sep 26, 2020 · 14 comments
Closed

Comments

@bkurtz
Copy link

bkurtz commented Sep 26, 2020

The new resolver in pip iterates over all versions of all packages, even if a small number (e.g. 2) of the specifications are completely incompatible.

Tested with latest pip from master.

What did you want to do?
Something like (this is just a toy example; using different packages for our application):
pip install --use-feature=2020-resolver marshmallow-sqlalchemy sqlalchemy==0.6.0 psycopg2-binary scipy

All available versions of marshmallow-sqlalchemy require sqlalchemy of at least 0.7, so the first two specifications are incompatible and will reasonably quickly (~11 seconds on my machine) give a failure if specified by themselves. However, adding additional packages with un-pinned versions (e.g. scipy, which depends on numpy) introduces an exponentially increasing number of options. The new dependency resolver apparently feels the need to test all of these, even though it could reasonably fail after discovering that the first two specifications were incompatible. I didn't let it run to completion, but I suspect it would literally take days.

Output

# time pip install --use-feature=2020-resolver marshmallow-sqlalchemy sqlalchemy==0.6.0 psycopg2-binary scipy
Collecting sqlalchemy==0.6.0
  Using cached SQLAlchemy-0.6.0.tar.gz (1.8 MB)
Collecting psycopg2-binary
  Using cached psycopg2_binary-2.8.6-cp36-cp36m-manylinux1_x86_64.whl (3.0 MB)
Collecting scipy
  Using cached scipy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
Collecting numpy>=1.14.5
  Using cached numpy-1.19.2-cp36-cp36m-manylinux2010_x86_64.whl (14.5 MB)
Collecting marshmallow-sqlalchemy
  Using cached marshmallow_sqlalchemy-0.23.1-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.23.0-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.22.3-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.22.2-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.22.1-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.22.0-py2.py3-none-any.whl (18 kB)
  Using cached marshmallow_sqlalchemy-0.21.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.20.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.19.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.18.0-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.17.2-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.17.1-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.17.0-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.16.4-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.16.3-py2.py3-none-any.whl (13 kB)
  Using cached marshmallow_sqlalchemy-0.16.2-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.16.1-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.16.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.15.0-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.14.2-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.14.1-py2.py3-none-any.whl (9.6 kB)
  Using cached marshmallow_sqlalchemy-0.14.0-py2.py3-none-any.whl (9.6 kB)
  Using cached marshmallow_sqlalchemy-0.13.2-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.13.1-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.13.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.12.1-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.12.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.11.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.10.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.9.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.8.1-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.8.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.7.1-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.7.0-py2.py3-none-any.whl (12 kB)
  Using cached marshmallow_sqlalchemy-0.6.0-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.5.0-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.4.1-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.4.0-py2.py3-none-any.whl (11 kB)
  Using cached marshmallow_sqlalchemy-0.3.0-py2.py3-none-any.whl (10 kB)
  Using cached marshmallow_sqlalchemy-0.2.0-py2.py3-none-any.whl (10 kB)
  Using cached marshmallow_sqlalchemy-0.1.1-py2.py3-none-any.whl (10 kB)
  Using cached marshmallow_sqlalchemy-0.1.0-py2.py3-none-any.whl (10 kB)
Collecting numpy>=1.14.5
  Using cached numpy-1.19.1-cp36-cp36m-manylinux2010_x86_64.whl (14.5 MB)
  Using cached numpy-1.19.0-cp36-cp36m-manylinux2010_x86_64.whl (14.6 MB)
  Downloading numpy-1.18.5-cp36-cp36m-manylinux1_x86_64.whl (20.1 MB)
     |████████████████████████████████| 20.1 MB 8.7 MB/s 
  Downloading numpy-1.18.4-cp36-cp36m-manylinux1_x86_64.whl (20.2 MB)
     |████████████████████████████████| 20.2 MB 3.3 MB/s 
  Downloading numpy-1.18.3-cp36-cp36m-manylinux1_x86_64.whl (20.2 MB)
     |████████████████████████████████| 20.2 MB 3.3 MB/s 
  Downloading numpy-1.18.2-cp36-cp36m-manylinux1_x86_64.whl (20.2 MB)
     |████████████████████████████████| 20.2 MB 4.6 MB/s 
^CERROR: Operation cancelled by user

real	1m25.119s
user	1m2.579s
sys	0m3.083s

Additional information

We had something similar to this in our application (pinned version of sqlalchemy, unpinned version of another package that depended on it and had ~200 published versions), and it was literally at a point of running for 45 minutes with no output to the console (verbose mode did help). I ended up having to debug by iteratively removing things until I got down to the incompatible packages.

@uranusjr
Copy link
Member

Looks similar to #8893.

@uranusjr
Copy link
Member

OK no, this is a separate problem. Not sure if the resolver is not quitting right after it cannot find a marshmallow-sqlalchemy version that works.

@uranusjr
Copy link
Member

Ah I know why now. The resolver is not smart enough to tell that psycopg2-binary and scipy are irrelevent to the marshmallow-sqlalchemy conflict, and goes on to try them anyway since it is theoratically possible they provide a solution to the conflict. The direct URL requirement feature (PEP 508 package @ URL syntax) means that any package may introduce a previously unknown version of marshmallow-sqlalchemy that works with sqlalchemy 0.6.0, so it can’t quit yet when all currently-known marshmallow-sqlalchemy choices fail. It must traverse everything to make sure.

I don’t have a good solution to this problem, given how Python packaging works, unfortunately 😞

@pfmoore
Copy link
Member

pfmoore commented Sep 26, 2020

Good analysis. I agree this is a fundamental problem with Python packaging, and I also don't see an obvious way of fixing it. The frustration here is that the cases that go wrong are rare, but we have to assume the worst - so everyone pays a cost for a minority feature.

Hmm, one possible way we might be able to address this - if we merge constraints like sqlalchemy>=0.7 and sqlalchemy==0.6.0 using a sufficiently clever symbolic computation engine, we could prove that there's no possible version that satisfies both constraints. Unfortunately we don't have a library that does that sort of reduction (so we currently just check each constraint independently).

@bkurtz
Copy link
Author

bkurtz commented Sep 26, 2020

That seems like a sneaky feature of the packaging system - I'm learning all kinds of new things about it today.

@pfmoore, I'm not sure your suggestion quite works (at least not by itself) because the issue is that another repository might provide a URL-keyed version of marshmallow-sqlalchemy that doesn't have a constraint of sqlalchemy>=0.7. (but maybe you're ahead of me; see item 3 below)

Three thoughts on things that pip might do:

  1. Surely this case is a fairly minority case, and most users just need a message to tell them that they need to update dependency versions. Even if pip needs to continue searching all the remaining packages, it would go a long way if it could detect that and print a big red warning message before it sits and chugs through the rest of the search. The setup I showed above doesn't have this property, but for our production app's requirements, pip literally sat at a console without printing any output for 45 minutes, and if there'd been a warning about "your packages are probably incompatible but I'm going to do an exhaustive search anyway", it'd have made fixing the issue quite a lot easier.
  2. I wonder if this is enough of an edge case that pip could either disable the extended search by default (i.e. users who want to use it need to run with a flag) or could at least not do an exhaustive search through other packages - i.e. "if one of your other packages specifies a packageB @ URL in its currently selected version that resolves this, we'll accept that, but otherwise we won't do the exhaustive search". <- I could definitely see how this would be an annoying option for pip maintainers though, so no offense if you don't like it.
  3. I had the impression (haven't tested rigorously) that pip was iteratively trying the full matrix of options here (e.g. it might be 30 versions of marshmallow-sqlalchemy * 30 versions of scipy * 30 versions of numpy * 30 versions of psycopg2-binary = 810,000 combinations) to see if dependencies were satisfied. If that's the case, even if we have to search all available packages to see if any of them provides an alternative, it would be way faster if you could just search the packages directly, and once you see none of them provides a workable alternative you can bail out early.

There might be some other things that could be cut as well; I didn't check on the combination of packages in this issue, but in our production app, when I ran in verbose mode, pip seemed to be repeatedly re-downloading the index of available versions of sqlalchemy for each combination of installs it wanted to try (and I think it was trying to get a list from pypi as well as a couple of our internal package servers), so cutting out that sort of thing could probably save some resources too if pip does have to go through the whole process.

@uranusjr
Copy link
Member

You hit the nail right on the head here 🙂

I think one simple way pip can do here is to ask for help eagerly, even if it hasn’t exhaused all possibilities. It is very rarely the user actually wants marshmallow-sqlalchemy 0.1.0 when they specify a large version rage, and pip can stops after trying a few times and just aborts saying “hey eh I haven’t tried everything but marshmallow-sqlalchemy is taking too much time, please tell me more especifically what you need”. This goes back to the ResolutionTooDeep discussion in #8683; what would be a good metric to abort here? Times the resolver backtracked? Or maybe times the resolver backtracked for a specific package?

@pradyunsg
Copy link
Member

pradyunsg commented Sep 27, 2020

We can trim the search tree by remembering the cause of conflicts we've seen, and refusing when we re-explore the same subtree.

alpha 1.0 conflicts with beta 3.0, and once we figure that out, we don't need to re-understand that by remembering the conflict (basically CDCL).

@pfmoore
Copy link
Member

pfmoore commented Sep 27, 2020

Maybe we could add some logging that just says "pip has tested N possibilities" that gets updated as the work goes on (like the progress bars, but without a total known in advance). Being able to collect some feedback from users (or from example test cases) on what typical numbers look like might help us work out a more reasonable metric...

@bkurtz
Copy link
Author

bkurtz commented Sep 27, 2020

Certainly anything that can be done to trim the search tree helps - and more generically than just this issue, if done correctly.

I think adding logging about the number of tested possibilities would be helpful, but I think it's more useful for the user to know what pip's having trouble with - in this case, knowing that pip was having trouble finding an acceptable version of sqlalchemy and that packages A, B, and C have dependencies on that would be super useful in troubleshooting.

If you were looking for a metric on when to give up (or whatever), I'd suggest doing it based on "time spent excluding downloads". Not sure whether it's 1 minute or 5 (I'd guess closer to 1), but probably by the time pip has spent 5 minutes of CPU time running the resolver (not counting the part of downloading files), your average user is going to get frustrated and give up.

One other thing I'd point out is that this seems like something (like many other issues with the new resolver) where it would actually be super helpful to have some sort of cache of package requirements, at least at the pypi level - if pip could download a list of all versions of a package and their requirements, that could save quite a bit of time in this process, especially when you consider packages like numpy that can run 20MB per version. At that point, one could even start pre-combining requirements (e.g. maybe marshmallow_sqlalchemy versions 0.1 through 0.14 require sqlalchemy >=0.7 and versions 0.15+ requires sqlalchemy >= 0.8; then instead of the resolver needing to consider 42 different specs, you can just consider 2). Given that most packages probably change requirements for a fairly small fraction of their releases, that sort of thing could help drastically reduce the search space.

@pfmoore
Copy link
Member

pfmoore commented Sep 27, 2020

One other thing I'd point out is that this seems like something (like many other issues with the new resolver) where it would actually be super helpful to have some sort of cache of package requirements

100% true. The problem is that we already have a cache of wheels, which basically means that getting wheel metadata is already done. And we have a cache of wheels built from sdists, which covers a lot of the rest. What's not obvious (to me, at least!) is why that's not enough, or to put it another way, given that we have those caches, where's the time being spent? We really need to instrument the code, and then fire some of the bigger/more problematic examples at it, but so far, no-one's had the time to do that 🙁

@pfmoore
Copy link
Member

pfmoore commented Sep 27, 2020

In fact, the caches do seem to be working. We get a distinct improvement from #8912 which basically eliminates a lot of unnecessary network calls to PyPI. I suspect that with #8912 included, what's left in this issue is just the fundamental problem that pip doesn't know it can give up without checking everything...

@pradyunsg
Copy link
Member

given that we have those caches, where's the time being spent?

As far as I can tell, it's the network overhead of reaching out and getting back a "you have it in the cache already" response. :(

@brainwane
Copy link
Contributor

As I just noted in #8683 (comment) , the team discussed this problem in last week's meeting and work is now in progress so that pip will inform the user when the resolver is doing a lot of backtracking. We'll probably be giving the user a short in-terminal error message and pointing to longer documentation, as we did with this conflict resolution documentation.

@brainwane
Copy link
Contributor

@bkurtz Thank you for the issue report! In #8683, #8346, #8975, and some related issues and pull requests, we addressed the fundamental problem of how much and in what cases pip backtracks, and a user-visible part of that which is the informative and error messages pip outputs during backtracking.

In today's team meeting we decided that this is sufficient to close this issue. @pradyunsg will file a followup issue about the idea of printing package names better in error output.

Thanks @bkurtz and I hope you can also test the pip 20.3 beta release in case there are other issues for you to report!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants