Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pip download --no-binary for packages using PEP-517 by ignoring --no-binary when building isolated build environment for purposes of verifying downloaded sdist metadata #10589

Open
1 task done
tvalentyn opened this issue Oct 14, 2021 · 14 comments
Labels
S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature

Comments

@tvalentyn
Copy link

What's the problem this feature will solve?

pip download --no-binary :all: is sometimes too slow for packages that switched to use PEP-517, or have packages using PEP-517 in their dependency chain.

To observe this issue, run pip download in verbose mode:

 pip -v download --dest /tmp numpy==1.21.2 --no-deps --no-binary :all:

Users have brought up this issue couple of times, however there seems to be some confusion as to actual reasons of this behavior.

Possible explanations that are likely incorrect:

  • It is not a pip issue, but a setuptools issue.
    • We can see that this issue also happens for packages using poetry. For example, see: pip -v download --dest /tmp poetry --no-deps --no-binary :all:
  • It is not a pip issue, but it happens to packages using a legacy / convoluted build process.
    • I believe this can happen to any package that declares build dependencies via PEP-517. Even installing Cython, which a common build dependency, takes significant time when it is installed from sources.
  • It is a known issue, this issue is yet-another-duplicate of: Avoid generating metadata in pip download --no-deps ... #1884
    • Yes and no.
    • Yes, it is known issue that pip needs to verify dependency metadata after downloading an sdist, so pip is building this metadata after downloading sources.
    • However, there is a new aspect of the issue that gained higher severity after packages started to adopt PEP-517, because pip download --no-binary usability has decreased to the point of becoming unusable, example: pip download --no-deps tries to use PEP517 so badly it is not usable to download stuff #9701.

Describe the solution you'd like

A user of pip download --no-binary cares about sdists of the package and its dependencies. However a user likely doesn't care whether sdists or bdists of build deps are used for package metadata verification purposes.

After pip download --no-binary has downloaded an sdist, and starts verifying package metadata via prepare_metadata_for_build_wheel hook, pip should ignore --no-binary flag when creating an isolated build environment for the purpose of metadata verification.

Also if users don't specify --no-deps, and specify --no-binary, chances are they may still want to download PEP-517 build dependencies as well, not just runtime dependencies, and if they specify --no-binary :all:, then it would make sense to download and save sdists of build dependencies in target download folder (this doesn't happen right now but may be worth a separate issue?). But in any case, we don't have to install build deps from sdists for metadata verification, and can use bdists.

Alternative Solutions

  1. Wait until Avoid generating metadata in pip download --no-deps ... #1884 is addressed. However that issue looks like it may be a harder problem to solve, based on the history of the issue and reasons for why metadata checks are done currently.
  2. Add a flag making the optimized behavior optional, smth like --prefer-binary-in-pep-517-build-environment
  3. There are some workarounds that suffice for downloading a single package without it's dependencies, but they do not work for downloading package and its runtime dependencies :

Additional context

Aspects of this issue have been discussed at various points in

Code of Conduct

@tvalentyn tvalentyn added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels Oct 14, 2021
@tvalentyn
Copy link
Author

cc: @pfmoore @pradyunsg @uranusjr

@pradyunsg pradyunsg changed the title Optimize pip download --no-binary for packages using PEP-517 by ignoring --no-binary when building isolated build environment for purposes of verifying downloaded sdist metadata. Optimize pip download --no-binary for packages using PEP-517 by ignoring --no-binary when building isolated build environment for purposes of verifying downloaded sdist metadata Oct 15, 2021
@uranusjr
Copy link
Member

One immediate idea is to add an :runtime: wildcard the mean "all runtime requirements but not build requirements.

@tvalentyn
Copy link
Author

I think one question to answer is, what do we want pip download --no-binary behavior to be for packages that have PEP-517 build dependencies. It seems that historically the intent of this command was to download all package dependencies. Now we have runtime dependencies and build dependencies. Do users care about build dependencies when they request sources of all deps in their package dependency chain or only runtime dependencies?

If they don't care about downloading build dependencies, then :runtime: wildcard may work. If they do, then:
a) pip should modify the pip download command behavior for PEP-517 packages to also download build deps.
b) do users want to download build deps as sdists or bdists? If they originally passed --no-binary :all:, there is a reason to believe users might want sources only including for runtime deps, so :runtime: wildcard would get in the way.

If we think that pip download command will never have to download build deps, then :runtime: wildcard may work here.

@pfmoore
Copy link
Member

pfmoore commented Oct 15, 2021

I don't know the answer to the question "what do people want", but I do know that in the past there has been a strong insistence from some areas that downloading everything as source is an important use case. This came up when backend-path was added to PEP 517 (the discussions were on Discourse, in the packaging category), where people were insisting that even the build backend (setuptools, flit or whatever) must be built from source.

So I agree with you that it's important to find out what people want in this area, and hopefully, if you hunt out those (and other) prior discussions, that might help clarify that question. I don't have any insights personally - I have no interest in --no-binary, if anything I'm far more likely to use --only-binary in the sorts of situation I work in.

@uranusjr
Copy link
Member

In a sense, we are having this conversation exactly because we listened to use cases and actually implemented them. As Paul said, --no-binary :all: exists because there is a concrete use case where people want pip to install exactly all things from source due to organisation policies, and it is both cumbersome and even error-prone to having to write down every package individually (for example if you pip install foo --no-binary=foo and a new version of foo adds a new dependency bar you'd be violating the source-only policy unknowingly).

Since the people requesting this new "packages that I consider a part of :all:" are the newcomers presenting a different use case, I don't think it's a good idea to hijack :all: to mean something else. You need a new feature, and we are willing to listen, but that doesn't allow you to take away a feature not designed for you.

@tvalentyn
Copy link
Author

tvalentyn commented Oct 18, 2021

there is a concrete use case where people want pip to install exactly all things from source due to organisation policies

It sounds like for this use case, build dependencies would need to be installed (or preinstalled) from sources as well.

I don't think it's a good idea to hijack :all: to mean something else.

I may be misundestanding this point, but I think the change here is that now we have PEP-517, which can specify build dependencies, and the question is, should :all include build dependencies or not. For dependency graphs that do not include PEP-517 packages, the interpretation of :all: did not change, so meaning stays the same.

You need a new feature, and we are willing to listen, but that doesn't allow you to take away a feature not designed for you.

Great point, and thanks for listening to users. As a user, I opened this issue because of a noticeable regression in pip download that was difficult to understand. Feature-wise, I think for the project I am working on, the right thing to do would be to switch to --prefer-binary, which still works well and would have other benefits, so I don't have a strong opinion on how resolve the regression in --no-binary use-case.

@pradyunsg
Copy link
Member

the question is, should :all include build dependencies or not.

Here's an explicit answer: Yes, and people already depend on this being the case so this behaviour likely will not change.

I think for the project I am working on, the right thing to do would be to switch to --prefer-binary, which still works well

Given that your usecase is covered by a different flag, is there anything actionable here?

@pradyunsg
Copy link
Member

If the point here is "PEP 517 builds are slow and, hence, a regression" , then... well, there's already many open issues for that and I'd prefer that we close this one in favour of those.

@tvalentyn
Copy link
Author

the question is, should :all include build dependencies or not.

Here's an explicit answer: Yes, and people already depend on this being the case so this behaviour likely will not change.

I don't think this is the case. For example, for numpy:

$ python3 -m pip download --dest /tmp/numpy_all numpy==1.21.2 --no-binary :all: 
Collecting numpy==1.21.2
  Downloading numpy-1.21.2.zip (10.3 MB)
     |████████████████████████████████| 10.3 MB 1.6 MB/s 
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Saved /tmp/numpy_all/numpy-1.21.2.zip
Successfully downloaded numpy
WARNING: You are using pip version 21.2.4; however, version 21.3 is available.

$ ls /tmp/numpy_all
numpy-1.21.2.zip

As we see, pip downloaded only the source tarball of numpy, but not sources of PEP-517 build dependencies of numpy (setuptools, wheel, Cython). As I mentioned, this can be a separate issue of it's own, leaving it up to pip maintainers if we need to open one.

I think for the project I am working on, the right thing to do would be to switch to --prefer-binary, which still works well

Given that your usecase is covered by a different flag, is there anything actionable here?

If you think that pip download --no-binary is an important use case to support, consider making this use case fast again. My suggestion is in Describe the solution you'd like (first two paragraphs).
If the decision is "We know this is slow now, and that's ok/low priority issue", feel free to close this, but please leave the issue available for community to comment on later, in case more users chime in that the slowdown adversely affects them and they don't have a workaround.

@tvalentyn
Copy link
Author

I'd prefer that we close this one in favour of those.

I have not seen an issue that actually explained the rootcause of the slowdown (which I described in this issue, that's why I opened it), but if there is a duplicate, feel free to deduplicate.

@uranusjr
Copy link
Member

uranusjr commented Oct 19, 2021

I don't think this is the case. For example, for numpy:

You're misunderstanding the use case. The goal is to not run any downloaded binary, which :all: promises. Whether build dependencies is included as a download artifact has nothing to do with this.

@tvalentyn
Copy link
Author

tvalentyn commented Oct 19, 2021

You're misunderstanding the use case. The goal is to not run any unaudited binary, which :all: promises. Whether build dependencies is included as a download artifact has nothing to do with this.

@uranusjr I see, thanks for correction. In this case, this behavior sounds like "Working as intended". I thought the intent / primary use case of the pip download --no-binary :all: command was that users needed to download all the sources, and then build everything themselves, from scratch. However you are saying that the intent for --no-binary :all: is for pip to never use any binary distribution over the course of executing the download command (and still download the sources). This was not obvious to me.

@tvalentyn
Copy link
Author

Sounds like a :runtime: wildcard could be a solution here then.

@uranusjr
Copy link
Member

the intent for --no-binary :all: is for pip to never use any binary distribution over the course of executing the download command (and still download the sources). This was not obvious to me.

This semantic is shared between multiple pip commands (I think download, install, and wheel). The behaviour is more obvious for the latter two, because in those cases you'd expect pip to need the binaries (for installation), and --no-binary means "please don't download those binaries, always build from scratch". This semantic is less obvious for download, but it's also not a good idea to make it mean anything else (that'd just confuse people in another way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants