Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Traversable objects in the native loaders and finders #102097

Open
FFY00 opened this issue Feb 21, 2023 · 13 comments
Open

Support Traversable objects in the native loaders and finders #102097

FFY00 opened this issue Feb 21, 2023 · 13 comments
Labels
topic-importlib type-feature A feature request or enhancement

Comments

@FFY00
Copy link
Member

FFY00 commented Feb 21, 2023

Feature or enhancement

Summarizing, it would consist of users being able to add Traversable objects to sys.path and having the default loaders pick them up.

Pitch

This proposal aims to simplify the implementation of import redirection and similar use-cases by introducing support for abstracted paths in our default loaders.

Loaders are a bit too low

Currently, one of the most common use-cases of custom loaders is to implement some kind of import redirection, not to add support for new module types. By import redirection, I mean that the loader maps import names to normal modules, but in different locations.
A good example of this would be loaders for packages installed in editable mode.

The loader abstraction is a bit lower level than desired for this use-case, so implementing a loader for these applications is pretty hard. One needs to either re-implement all the resolution and import logic, or subclass the default loaders and customize them.

How Traversables can help

Traversable is a protocol introduced in 3.9 to provide a path abstraction to use in importlib.resources. If we change the default loaders to use this abstraction, we remove the need for customs loaders to implement support for data sources.
Placing an abstraction at this level should allow us to better support environments that lack a native file-system, support custom ... more consistently, and even possibly remove some complexity from the import system, like zipimport, in favor of a lightweight ZipPath alternative + the default loaders.

Implementation

This would likely consist of adding a new _Path type that implements the Traversable protocol using os.path, to use with string paths, and then change the loaders to use the abstraction, instead of using os.path directly.
Some loaders use features that are not provided by Traversable, like stat, but we can ensure backwards compatibility by setting __fspath__ in our _Path implementation, and still performing those operations if a file system path is available.

I am not super worried about the performance impacts of this, as I think this abstraction should be fairly light.

Previous discussion

This is somewhat related to #89710, and is a step in the right direction for that.

@FFY00 FFY00 added type-feature A feature request or enhancement topic-importlib labels Feb 21, 2023
@FFY00
Copy link
Member Author

FFY00 commented Feb 21, 2023

cc @python/importlib-team

@FFY00
Copy link
Member Author

FFY00 commented Feb 21, 2023

cc @pfmoore too because I am pretty sure this would be handy for https:/pfmoore/editables

@pfmoore
Copy link
Member

pfmoore commented Feb 21, 2023

It might have saved me having to write a custom loader, but as that's done now, it's not that big a deal for me. I wouldn't change editables at this point, at least not until we get to the point where dropping support for Python 3.11 is possible.

Also, I'm not entirely sure how this would work for the editables case. What would I put on sys.path that could map a module name to a file? Given that the import part of the redirector in editables is only a few lines implementing find_spec, I'm not sure writing a Traversable instance would be much easier.

Would the work @barneygale is doing to make pathlib extensible supersede this? Adding pathlib.Path objects might be more general.

Actually, I just checked the docs and sys.path is currently restricted to only allowing strings. Way back in PEP 302 this was a deliberate choice for compatibility reasons. While it's quite possible things have changed enough since then that the restriction no longer makes sense, it's worth checking that having non-strings on sys.path won't break anything.

@barneygale
Copy link
Contributor

This would likely consist of adding a new _Path type that implements the Traversable protocol using os.path, to use with string paths, and then change the loaders to use the abstraction, instead of using os.path directly.

Why not use pathlib.Path? It already implements Traversable if I'm not mistaken.

Would the work @barneygale is doing to make pathlib extensible supersede this? Adding pathlib.Path objects might be more general.

If Traversable provides the interface you need then it seems like a good fit to me. I'd only reach for pathlib.AbstractPath (when it comes out [at least a few months away]) if you need methods that Traversable doesn't provide, like path.parent or path.with_suffix().

@FFY00
Copy link
Member Author

FFY00 commented Feb 22, 2023

It might have saved me having to write a custom loader, but as that's done now, it's not that big a deal for me. I wouldn't change editables at this point, at least not until we get to the point where dropping support for Python 3.11 is possible.

editables is currently implemented via a Finder, not a Loader, and is also currently somewhat limited in scope. It only supports re-mapping top-level modules, but that shouldn't be too hard to fix with the current approach anyway, does not support namespace packages, does not support editable data in data files, etc. It also uses a fairly simple and limited approach to import redirection, tying the imported modules to the normal loader, which is why it can get away with not implementing a loader in the first place, making any code that does some kind of import introspection prone to breakage.

Most of the stuff here is not really a problem for most people, but the limited scope makes it unusable for some more advanced projects, like https:/mesonbuild/meson-python. The issue is that for the people where editable's approach falls short, it is super difficult to meet their needs with the current state of the import system.

Also, I'm not entirely sure how this would work for the editables case. What would I put on sys.path that could map a module name to a file? Given that the import part of the redirector in editables is only a few lines implementing find_spec, I'm not sure writing a Traversable instance would be much easier.

Writing a Traversable would indeed be slightly more work, but not that much IMO. If you want to keep your same scope, I'd still probably stay with your approach. If you want to expand your scope, however, that's where this proposal would help.

Adding pathlib.Path objects might be more general.

pathlib.Path has a huge API, and if we want an abstraction, which is the whole point of this proposal, we need to define an expected interface. Traversables seem like the obvious option, as they are already used by the reader protocol (see TraversableResources).

Actually, I just checked the docs and sys.path is currently restricted to only allowing strings. Way back in PEP 302 this was a deliberate choice for compatibility reasons. While it's quite possible things have changed enough since then that the restriction no longer makes sense, it's worth checking that having non-strings on sys.path won't break anything.

Yes, definitely. From my initial testing, nothing obvious breaks. User code is a worry, though, so we might want a different attribute.

The sys.path documentation does say

all other data types are ignored during import

Which hints to different kind of objects possibly being present on the list, so we may be able to get away with it, but definitely something to consider and discuss further.

Why not use pathlib.Path? It already implements Traversable if I'm not mistaken.

I am pretty sure that currently importing pathlib will break the import bootstrapping, which would further complicate the bootstrapping process if we want to use it.

@brettcannon
Copy link
Member

I'm -1 on this as I don't see a big enough benefit for the code complexity (or I'm -0 if I drop being an import expert and let other folks worry about compatibility and eventual bug reports 😉).

@FFY00
Copy link
Member Author

FFY00 commented Mar 1, 2023

Maybe my understanding of the complexity required is wrong, but it doesn't seem that complicated to me.

From my understanding, in _bootstrap_external for eg. we could just consolidate the _path_* and _write_atomic helpers into one class, which would actually be nice IMO, and make FileLoader, and ExtensionFileLoader instantiate the path helper if they receive a raw path. We might need some more misc polishes, but I don't think it'd be much, or much different than this.

Do you have anything specific in mind regarding the complexity?
The plan was to keep basically all code as-is, and just introduce an abstraction. There are, of course, several details up in the air regarding the implementation, but at least I believe it's not going to be something that big. I can put my money where my mouth is and go ahead with an initial implementation, if you are open to having a look, and you think you might be okay with it if it is as simple as I am making it be.

I can commit to dealing with the fallout if there's any issue, for whatever that's worth. We probably want other people reviewing changes, so I know it doesn't help that much.

@jaraco
Copy link
Member

jaraco commented Mar 1, 2023

This idea is interesting. I'm +0 on the idea, maybe +1 since there's a known unmet use-case without it. Since FFY00 is willing to support it, I'd be willing to help review an implementation.

@brettcannon
Copy link
Member

Do you have anything specific in mind regarding the complexity?

That in my experience, anything involving import semantics is complex. 😉 I'm also recovering from COVID-19, so in my slightly more tired state this isn't getting me excited enough to get past -0.

@brettcannon
Copy link
Member

I was finally able to realize why I'm uneasy about this proposal: it goes against the language definition. If you read https://docs.python.org/3/reference/import.html#path-entry-finders you will notice it says:

sys.path contains a list of strings ... Only strings should be present on sys.path; all other data types are ignored.

Shanging things so sys.path can take anything but a string will thus require changing the language definition, not just importlib.

On top of that, there's backwards-compatibility concerns. While the language definition may say "other types are [to be] ignored", we all know at least some, if not most, people have been too lazy to check the types of things on sys.path. So this change will very likely break lots of path entry finders who simply didn't plan for such a change.

As such, you will need to come up with a plan on how to handle all of this which starts to lurch into PEP territory.

@FFY00
Copy link
Member Author

FFY00 commented Mar 6, 2023

Yes, I understand. My plan was to try to check how much breakage adding these objects into sys.path would be, and if the answer differs from very little, then add a new sys attribute, which is undesirable. I am hoping using sys.path will be viable, but the core proposal does require it, even though it'd make things much simpler.

@brettcannon
Copy link
Member

My plan was to try to check how much breakage adding these objects into sys.path would be

How were you planning to do that? One of the great challenges of managing this sort of backwards-compatibility is all the "dark code" there is out there which we don't have access to. And my guess is there's a much bigger chance a company made a custom importer than an open source project.

@FFY00 FFY00 changed the title Support Traversable objects in sys.path Support Traversable objects in the native loaders and finders Mar 7, 2023
@FFY00
Copy link
Member Author

FFY00 commented Mar 7, 2023

You're right. The important part of the proposal (the bit that actually solves an issue) does not need this breakage anyway, maybe I am getting ahead of myself a bit.

Simply having our loaders being able to take Traversable objects, regardless of the semantics of sys.path, will allow people to abstract the file system without having to reimplement any of the import logic. We can then see what makes sense from there.
I have renamed the issue to reflect the decrease in scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-importlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants