Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Reasons to mangle and hash .so names after copy #409

Open
AlexanderSerov opened this issue Jan 26, 2023 · 4 comments
Open

[Question] Reasons to mangle and hash .so names after copy #409

AlexanderSerov opened this issue Jan 26, 2023 · 4 comments

Comments

@AlexanderSerov
Copy link

AlexanderSerov commented Jan 26, 2023

Hello. We had demand to supply shared libraries inside the wheel and while we do so we encountered problem with symlinks in wheel. This is how we reach this thread and particularly this message. So we extensively rely on auditwheel api to implement our own custom logic to copy libraries.

We encountered problem with mangling though. Our production library relying on cuda libs. Before we start use auditwheel, this cuda libs were stored in file system, outside of python package, and using custom environment variables we manage to add this cuda libs to the linker search path. The benefits of this approach - we can reuse this cuda libs to run other cuda dependent packages like pytorch or something else also using custom environment. As soon we started to copy libraries and mangle them we loose this ability.

In the message mentioned above, there is brief mention "IMO if you want to ship shared libraries in a wheel, then you should take the search problem seriously, and not rely on the linker’s naming scheme for system libraries.". What I want to ask is expand this abstract warning with concreate examples or some issues already raised or some articles on this subject... As far I i know, on linux for example, the linker search libraries in following order: 1) look in rpath, 2) if libs missed in rpath, we go to look in system. As far as we control rpath, looks like not any randomness introduced in matter of library search.
The implication of design with mangled library names is each package use they own shared libraries - is not good for the RAM. If mangling not take place we can reuse them, again, using custom environment variables. So question is, Is recommendation to mangle .so name is strong enough to follow?

@AlexanderSerov
Copy link
Author

@njsmith

@njsmith
Copy link
Member

njsmith commented Apr 27, 2023

There are two challenges:

  • There's no guarantee that all your packages will be installed into a single site-packages/ directory. There are lots of ways to set up a Python environment, and all you're really guaranteed if you depend on foo is that import foo will find the foo package -- maybe via $PYTHONPATH, maybe via some exotic sys.metapath trick, who knows. So there's no way for package A to reliably point an RPATH at package B, because package A doesn't know where its files will be located relative to package B.

  • If two different packages decide to ship the same library, and neither of them mangle their names, then you can end up with the linker accidentally picking the wrong package's library, and then everything is likely to explode. This is why auditwheel uses a content-hash for the name mangling, instead of a random string: if two wheels are shipping the exact same library, then it's fine to only load it once and save a bit of memory. But if they're shipping different libraries, even if the names are the same, we just can't guarantee that they're ABI compatible, so the only safe thing to do is to mangle them differently and make sure each package gets its own copy of the library.

@vimiix
Copy link

vimiix commented Jan 3, 2024

@njsmith Is it possible to add a parameter to let the user decide whether to rename .so file with content-hash?

@mayeut
Copy link
Member

mayeut commented Feb 3, 2024

please see #368 which might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants