Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte-compile files after installation #8541

Merged
merged 5 commits into from
Jul 5, 2020

Conversation

chrahunt
Copy link
Member

@chrahunt chrahunt commented Jul 4, 2020

One critical place we depend on extracted wheels is during generation of pyc files.

All high-level interfaces in the standard library expect file paths in their interface. A straightforward way to make that work with direct-from-wheel installs is to byte-compile files after they have been written to disk. In order to avoid re-compiling other files unnecessarily, we switch from compileall.compile_dir to compileall.compile_file and target the individual files we know we installed for the current wheel.

As a side-effect, this should also address #7808, since generating the pyc files in-place means the temporary wheel extraction directory should no longer be present in the generated pyc files.

Progresses #6030 and #7808.

@chrahunt chrahunt added the skip news Does not need a NEWS file entry (eg: trivial changes) label Jul 4, 2020
@chrahunt chrahunt force-pushed the refactor/compile-file-next-steps branch from e1e941d to 4180d19 Compare July 5, 2020 00:38
Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions, both of which can be deferred to follow-up PRs if you prefer.

Also, at some point someone should really move all of those inlined functions out of install_unpacked_wheel - it's a monster. I imagine that doing so would involve a chunk of work to decouple them, so that's for the future though.

continue
yield full_installed_path

def pyc_output_path(path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better moved to utils, as it's basically a compatibility wrapper around importlib.util.cache_from_source. Once we drop Python 2 support, we should remove this in favour of calling importlib directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If/when we switch to py_compile.compile I expect this to go away (or only be used inside it) since we'd need a compatibility wrapper for that as well.

# unicode input, but the declared type doesn't indicate
# it.
path_arg = cast('str', path)
success = compileall.compile_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using py_compile.compile would be better, as it returns the path of the compiled file, so we don't have to rely on recalculating it ourselves. The force and quiet arguments don't exist for that function, though, so we would need to change the logic somehow. I'm fine with doing that in a follow-up PR (and it's not a priority).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We'd still need to manually get the path on Python 2, unfortunately, but it should be about the same amount of code. I had originally gone that route, but figured the jump from compileall.compile_dir to compileall.compile_file was big enough, and would at least help get us closer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 more months. Just 3 more.

else:
return importlib.util.cache_from_source(path)

# Compile all of the pyc files for the installed files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved to after some of the other code - is there a reason for this, or was it just for code organisation? I don't think it makes any difference, so I'm not looking for you to change it, I'm just curious as to whether there is a benefit that I missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Byte-compilation has to occur after we have copied the files to their destination. In the current code that is after the calls to clobber with the root scheme files and the files in *.data.

@pfmoore
Copy link
Member

pfmoore commented Jul 5, 2020

will probably try to break it into smaller steps

Please don't. This is the right size for a PR in my opinion.

add tests for deterministic builds

Please make those a separate PR. I'm not convinced that we should make deterministic builds a guaranteed property of pip (I'm not even sure what precisely "deterministic" means) so I'd like to see that debated as an independent item.

@chrahunt
Copy link
Member Author

chrahunt commented Jul 5, 2020

Thank you for reviewing this!

will probably try to break it into smaller steps

Please don't. This is the right size for a PR in my opinion.

Here I was just referring to the commits themselves. I normally try not to move and change code in a single commit, but in this case it was difficult not to. If it was easy enough to review, then that's usually good enough for me. :)

add tests for deterministic builds

Please make those a separate PR. I'm not convinced that we should make deterministic builds a guaranteed property of pip (I'm not even sure what precisely "deterministic" means) so I'd like to see that debated as an independent item.

You're right to call this out. I'm specifically referring to the minor regression related to embedding the temp wheel extraction path into pyc files (#7808). I think this PR technically fixes it, but I don't want to consider that "done" until we have tests for it. That's lower priority for me than direct-from-wheel installs, so I don't mind leaving tests for a different PR at all.

@chrahunt chrahunt changed the title Byte-compile after installation Byte-compile files after installation Jul 5, 2020
We want to move towards having more control over the generation of pyc
files, which will allow us to provide deterministic installs and
generate pyc files without relying on an already-extracted wheel.

To that end, here we are stripping away one layer of abstraction,
`compileall.compile_dir`. `compileall.compile_dir` essentially recurses
through the provided directories and passes the files and args verbatim
to `compileall.compile_file`, so removing that layer means that we
directly call `compileall.compile_file`.

We make the assumption that we can successfully walk over the
source file tree, since we just wrote it, and omit the per-directory
traversal error handling done by `compileall.compile_dir`.
`compileall.compile_file` returns a success parameter, but can return
"successful" without actually generating a pyc file if the input file
was filtered out and compilation was not attempted.

In our file processing we mirror that logic, to ensure that a truthy
success returned by `compileall.compile_file` actually indicates a file
was written.
In order to add generated pyc files to the RECORD file for our package,
we need to know their path! To raise confidence that we're doing this
correctly, we assert the existence of the expected 'pyc' files while
still using the old installation logic.

Some valid reasons why pyc files may not be generated:

1. Syntax error in the installed Python files
2. There is already a pyc file in-place that isn't writable by the
   current user

We don't fail installation in those cases today, and we wouldn't want to
change our behavior here, so we only assert that the pyc file was
created if `compileall.compile_file` indicates success.
In our next commit we will use the scheme path to locate files to
byte-compile. If the scheme path is a `Path`, then that causes
`compileall.compile_file` (via `py_compile.compile`) to fail with:

```
.tox/py38/lib/python3.8/site-packages/pip/_internal/operations/install/wheel.py:615: in install_unpacked_wheel
    success = compileall.compile_file(
../../../.pyenv/versions/3.8.0/lib/python3.8/compileall.py:157: in compile_file
    ok = py_compile.compile(fullname, cfile, dfile, True,
../../../.pyenv/versions/3.8.0/lib/python3.8/py_compile.py:162: in compile
    bytecode = importlib._bootstrap_external._code_to_timestamp_pyc(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

code = <code object <module> at 0x7fa7e274f500, file "/tmp/user/1000/pytest-of-chris/pytest-37/test_std_install_with_direct_u0/dest/lib/sample/__init__.py", line 1>, mtime = 1593910285.2200587, source_size = 134

>   ???
E   ValueError: unmarshallable object
```

Debugging in gdb shows that the error is set due to the `Path` object
being present in the code object, which `marshal.dumps` can't handle
(frame 1):

```
 0  w_complex_object (v=<optimized out>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:564
 1  w_object (v=<Path at remote 0x7fffee51f120>, p=0x7fffffff7160) at Python/marshal.c:370
 2  w_complex_object (v=<code at remote 0x7fffee591710>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:544
 3  w_object (v=<code at remote 0x7fffee591710>, p=0x7fffffff7160) at Python/marshal.c:370
 4  w_complex_object (v=('1.2.0', <code at remote 0x7fffee591710>, 'main', None), flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:475
 5  w_object (v=('1.2.0', <code at remote 0x7fffee591710>, 'main', None), p=0x7fffffff7160) at Python/marshal.c:370
 6  w_complex_object (v=<code at remote 0x7fffee591ea0>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:539
 7  w_object (p=0x7fffffff7160, v=<code at remote 0x7fffee591ea0>) at Python/marshal.c:370
 8  PyMarshal_WriteObjectToString (version=<optimized out>, x=<code at remote 0x7fffee591ea0>) at Python/marshal.c:1598
 9  marshal_dumps_impl (module=<optimized out>, version=<optimized out>, value=<code at remote 0x7fffee591ea0>) at Python/marshal.c:1739
 10 marshal_dumps (module=<optimized out>, args=<optimized out>, nargs=<optimized out>) at Python/clinic/marshal.c.h:124
```

In the interest of easy git bisects, we commit this fix before the code
that would expose the bug.
There are a few changes here:

1. The byte-compilation now occurs after we copy the root-scheme files
   and files from any wheel data dirs
1. Instead of iterating over the files in the unpacked wheel directory,
   we iterate over the installed files as they exist in the installation
   path
2. In addition to asserting that pyc files were created, we also add
   them to the list of installed files, so they will be included in RECORD

By compiling after installation, we no longer depend on a separate
temporary directory - this brings us closer to installing directly from
wheel files.

By compiling with source files as they exist in the installation output
directory, we no longer generate pyc files with an embedded randomized
temp directory - this means that wheel installs can be deterministic.
@chrahunt chrahunt force-pushed the refactor/compile-file-next-steps branch from 4180d19 to 452e683 Compare July 5, 2020 13:39
@chrahunt chrahunt added the type: refactor Refactoring code label Jul 5, 2020
@chrahunt chrahunt marked this pull request as ready for review July 5, 2020 15:03
@chrahunt chrahunt merged commit 1f8a4dd into pypa:master Jul 5, 2020
@chrahunt chrahunt deleted the refactor/compile-file-next-steps branch July 5, 2020 16:56
@chrahunt
Copy link
Member Author

chrahunt commented Jul 5, 2020

Thanks again for the review. :)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
skip news Does not need a NEWS file entry (eg: trivial changes) type: refactor Refactoring code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants