Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PEP 518 support #3691

Closed
brettcannon opened this issue May 16, 2016 · 18 comments · Fixed by #4144
Closed

Implement PEP 518 support #3691

brettcannon opened this issue May 16, 2016 · 18 comments · Fixed by #4144
Labels
auto-locked Outdated issues that have been locked by automation type: enhancement Improvements to functionality
Milestone

Comments

@brettcannon
Copy link
Member

brettcannon commented May 16, 2016

https://www.python.org/dev/peps/pep-0518/

This is the pyproject.toml stuff.

@takluyver
Copy link
Member

I will attempt to implement this.

@takluyver
Copy link
Member

I made a start (branch), but I realised I'm not sure what to do about environments. Currently, it looks like wheels are built in the Python environment that's already active. PEP 517 says:

A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies.

That's not a requirement, but it is a justified recommendation. Creating a new environment would probably break installation of existing packages which rely on something else being installed first (e.g. using numpy.distutils). We can probably get round that by creating a new environment only if pyproject.toml exists.

However, how should pip create a new environment? On Python 3, it can use the stdlib venv module, but that's not present on Python 2. We could vendor virtualenv for Python 2, but that seems a bit ugly, since virtualenv in turn bundles pip. Options:

  1. Install build dependencies into the current environment when running pip wheel, or autobuilding wheels for pip install.
  2. Bundle virtualenv for Python 2, and use it or venv to create new environments to install build dependencies into. Put up with the circular bundling between pip and virtualenv.
  3. Bundle a backport of venv for Python 2 (would this only work on Python >= 2.7.9 with ensurepip?)
  4. Implement simple environment creation for the wheel building. From the environment requirements specified, I think it's sufficient to install to a temporary prefix and then modify PATH and PYTHONPATH to make the packages available to subprocesses.
  5. Use venv on Python 3, but install build dependencies without creating an environment on Python 2 (this would give confusing different behaviours).

@dstufft
Copy link
Member

dstufft commented Nov 25, 2016 via email

@pfmoore
Copy link
Member

pfmoore commented Nov 25, 2016

Agreed, option 4 seems reasonable. I don't think the additional complexities of virtualenv/venv are necessary for this use case, just manipulating paths seems like an entirely reasonable approach.

@njsmith
Copy link
Member

njsmith commented Nov 26, 2016

There are two potential advantages to using isolated installs:

  • We don't want to pollute the user's actual environment by dumping build requirements into it (esp. since different packages might have conflicting build requirements).
  • To minimize the number of broken packages that get uploaded to PyPI, we want to fail fast when there are build requirements that the user happens to have installed locally when testing, but that aren't actually listed in the build-requirements -- so ideally the build should only be able to "see" packages that it has listed.

Option 4 solves the first problem, but doesn't do anything for the second.

The second is a "would be nice" thing, so even partial solutions are helpful and it's okay if we're not perfect to start. But Option 4 + as much isolation as we can conveniently manage would be nice.

I think the most we can easily do is to invoke the build process with PYTHONNOUSERSITE=1 and -S. (This removes both user site-packages and system site-packages from the build process's sys.path.) It's not quite perfect because the -S won't be inherited if the build process in turn spawns another child python -- and unfortunately there doesn't seem to be any envvar equivalent to -S. But it should still be good enough to catch a lot of silly mistakes.

And Rob will get grumpy at me if I don't point out that there should probably be an option the user can pass to pip to disable the isolation, for those cases where you know the package has a broken build-requirements and you want to override them. (Or maybe there should just be an option to override the build-requirements directly?)

@njsmith
Copy link
Member

njsmith commented Nov 26, 2016

Oh, and yeah, agreed that we'd have to default to no-isolation mode for legacy projects that don't have pyproject.toml.

@brettcannon
Copy link
Member Author

My worry with option 4 is what if Python changes what is required for a venv in some future version? One of the key reasons of creating the venv module was so that virtual environment creation could be tied to the interpreter as necessary. By going with option 4 you break the abstraction of virtual environments through venv/virtualenv.

We are talking about wheel building, which shouldn't be as common as installation. Having virtualenv installed is also very common. Would it be so bad to simply use venv if available, fall back to virtualenv, and then if neither is available to error out and ask the user to install it? Anyone with a newer version of Python 2.7 will have pip so getting it installed shouldn't be that much of a burden (even if it means a virtual environment to create a virtual environment 😄), plus if they're doing builds with pip you can assume they know how to use pip to install something, so providing instructions to install virtualenv shouldn't be a burden.

@pfmoore
Copy link
Member

pfmoore commented Nov 28, 2016

My biggest concern here is that we could significantly increase build times. As @brettcannon says, the whole build mechanism should only be invoked when there's no wheel available, and given local wheel caches plus the availability of more and more wheels on PyPI, this is not as big an issue as it once was. But the worst hit will be on simple, pure-Python packages with no build dependencies. At the moment, we just run setup.py bdist_wheel. Changing that to unpacking a setuptools wheel, setting PYTHONPATH and running setup.py bdist_wheel isn't a huge hit. Changing it to create a complete venv might be. I honestly don't have a feel for the difference.

Some other thoughts:

  1. To install build dependencies into a venv, we'd need pip installed in that venv. That's yet more time to build the venv, and we'd in theory need to uninstall pip before starting the build (in practice, I can't see pip counting as "polluting" the venv, though).
  2. To install using option 4, I assume we'd do pip install -t, but that doesn't install scripts, so we have an issue in that the requirement "All command-line scripts provided by the build-required packages must be present in the build environment's PATH" would need extra work to satisfy.

Given the second of these, I'm starting to think we might need to go down the venv/virtualenv route as Brett suggests. I'd also be inclined to allow for a pip wheel --non-isolated option to bypass the building of an isolated environment (as suggested in the PEP, although it talks in terms of a system-site-packages venv) to allow Python 2.7 users to build without needing to install virtualenv.

And while premature optimisation and all that, I think we should consider reusing the venv. I'd hate pip install foo bar baz spam ham another etc to potentially create 7 venvs. And if we ever get an "upgrade all" command (or users simulate their own via scripts) things could get nasty very fast.

@RonnyPfannschmidt
Copy link
Contributor

I dont think virtualenv in its own is even in the direction of a solution

Buildout and easy_install silver this problem using working sets and multi version installs,
Nixos has a very similar solution where creating environments equals creating a tree of symlinks

I believe in order to support the PEPs adequately there is a need to disengage from the model of virtualenv, which was ever since its inception expensive

@njsmith
Copy link
Member

njsmith commented Nov 29, 2016

Regarding virtualenv speed, on my laptop (so, SSD), I get 0.2 seconds wallclock time for virtualenv --no-setuptools --no-pip --no-wheel, and 2.1 seconds wallclock time for plain virtualenv (so it takes ~1.8 seconds to install setuptools + pip + wheel). I'm surprised at how quick this is, actually. Also, a substantial amount of the time appears to be the overhead cost of installing anything at all (I guess launching pip or something?) -- installing any one package takes between 1.1 and 1.4 seconds additional time, versus 1.8 seconds additional time for all three of them.

Regarding how to actually get the packages into the venv/virtualenv/poor-man's-packages-directory, we also have the option of teaching pip how to install into a given bindir+site-packages-dir. For wheels at least this is pretty trivial in principle. In practice of course pip's intrinsic... pip-ness might make it harder.

But in general let's keep in mind that this is going to start out as an opt-in thing (only applies to projects with a pyproject.toml), so it's okay if the first version is imperfect and needs to be iterated on. My suggestion is that @takluyver should go ahead with whatever cheap-and-dirty hack seems easiest, and then we can re-evaluate :-)

@RonnyPfannschmidt
Copy link
Contributor

@njsmith with nix for example you can get a full python environment with preinstalled dependencies in about 0.1-0.2 seconds including all packages (asuming the packages are in the package cache), even pulling in a numpy heavy stack and creating a full env is a matter of fractions of a second

once one uses a system that is in fact orders of magnitude faster, virtualenv&co simply pales

@pfmoore
Copy link
Member

pfmoore commented Nov 29, 2016

@njsmith Are you using Linux/OSX? On Windows, virtualenv --no-setuptools --no-pip --no-wheel took 8 seconds wallclock time from a cold start, 2 seconds on subsequent runs. That's on a laptop with SSD. py -m venv --without-pip takes about half that (4 seconds cold, 1 second warm).

I don't want to start a debate on the virtues or otherwise of particular operating systems, but I don't think we can ignore the cost of creating a (full) venv for every package installed from source.

@njsmith
Copy link
Member

njsmith commented Nov 29, 2016

@RonnyPfannschmidt: that's cool, but I'm not quite seeing how it's relevant to this discussion...?

@pfmoore: Yeah, my measurements were on Linux.

@takluyver
Copy link
Member

I'm not too concerned about performance, so long as it's on the order of seconds, because as you already pointed out, many packages provide wheels, and even for those that don't, pip automatically caches wheels. And this will only affect packages using PEP 518, whose authors are probably savvy enough to build wheels.

However, we don't currently have an environment solution pip can conveniently use on Py2, so I'm going to go ahead with the temporary prefix solution for now.

@pfmoore
Copy link
Member

pfmoore commented Nov 29, 2016

OK, cool. As long as we're not implementing isolation for projects without a pyproject.toml, it's not something we need to get too excited about right now.

@dstufft
Copy link
Member

dstufft commented Nov 29, 2016

Yea I think it's important that (A) This is opt-in on the project author's side (via including a pyproject.toml and (B) We isolate by default when opting in.

This will push us closer to a future where all projects correctly define their build dependencies :)

@dstufft
Copy link
Member

dstufft commented Nov 29, 2016

Oh, and I think the other important thing is that when a project author opts in via pyproject.toml, we never execute the setup.py install path. We only install from wheels (unless this ends up being really horrible to implement). My reasoning there is (A) we can always add back in setup.py install if we need to in a future version and (B) it will push us closer to a world where all projects can be built as wheels and cached in pip's wheel cache and (C) it will reduce the # of different deviations a single install can have (when going via the new mechanisms).

@thanatos
Copy link
Contributor

thanatos commented Apr 7, 2017

A note on the (otherwise good looking) PEP:

Two is that YAML itself is not safe by default. The specification allows for the arbitrary execution of code which is best avoided when dealing with configuration data.

This is incorrect. YAML itself encodes a data structure, not code that can be executed. While it is true that some parsers, including the major one for Python, PyYAML, implement poorly designed APIs that permit arbitrary code execution — and for that reason, I agree with your conclusions — flaws in parser implementations do not apply to the language itself on the whole. Any data structure language that allows for hooks to extend the set of types supported would suffer the same flaw if those hooks were implemented poorly. TLS is not insecure because OpenSSL has vulnerabilities, and HTTP is not insecure because a server could simply execute an entity body.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants