Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bindings incompatibility between bipedal_locomotion_framework and manifpy #386

Open
diegoferigo opened this issue Aug 3, 2021 · 8 comments

Comments

@diegoferigo
Copy link
Member

diegoferigo commented Aug 3, 2021

Premise: this problem is really subtle and took me quite a while to figure out what was going on. It is highly dependent on the setup, it occurred in my laptop but the problem could happen in any other similar configuration.

My daily setup is a Docker image generated from this Dockerfile. It uses conda to install a lot of dependencies, and then builds a complete robotology-superbuild and the Ignition stack from sources. While building the image, the default compiler is what provided in conda-forge.

Then, in order to make the QtCreator autocompletion happy, in the runtime container I set Ubuntu's clang as default compiler. My development pattern is having a blf repo mounted in a persistent volume from the host (this means I don't use the blf in the src folder of the superbuild) but then I install it inside superbuild's build/install, overriding the previous files.

This means that, when compiling blf outside the superbuild, the runtime compiler is used (clang) as opposed to what used during the image build process (conda-forge's gcc).

In this setup, executing Python code that uses both blf and manif, the following error occurs:

In [1]:         import bipedal_locomotion_framework.bindings as blf
   ...:         import numpy as np
   ...:         import manifpy
   ...:         quat = [0.0, 0, 0, 1]
   ...:         c = blf.contacts.ContactBase()
   ...:         c.pose = manifpy.SE3(position=np.array([-0.0072, 0.0786, 0.0]), quaternion=quat)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-da0cdb9a3959> in <module>
      4 quat = [0.0, 0, 0, 1]
      5 c = blf.contacts.ContactBase()
----> 6 c.pose = manifpy.SE3(position=np.array([-0.0072, 0.0786, 0.0]), quaternion=quat)

TypeError: (): incompatible function arguments. The following argument types are supported:
    1. (self: bipedal_locomotion_framework.bindings.contacts.ContactBase, arg0: manif::SE3<double>) -> None

Invoked with: <bipedal_locomotion_framework.bindings.contacts.ContactBase object at 0x7f30e01a7f70>, <manifpy._bindings.SE3 object at 0x7f30d82e8bb0>

At first sight, I though of some incompatibilities between manif, maybe an updated version, and what blf expects. However, the investigation didn't bring me towards any solution.

After some more digging (and related cursing) I read this SO question, and I realized that the problem was similar to what I was experiencing. Matching the compilers, in my case compiling with conda-forge's gcc, solved the problem.

This being said, it is clear that this incompatibility of bindings that do not share the same compiler is quite fragile. We should keep this in mind in case it occurs in the future in other setups. I don't really have any workaround proposal. For sure, exposing as we were doing before just the types we needed would solve it, even though it would be suboptimal for those that already use manifpy in their code since they would have to convert all objects to the equivalent blf.SE types.

When @GiulioRomualdi first tried this integration with upstream's manifpy and it magically worked, I was first quite surprised, but then I thought that sooner or later some limitation would have occurred. I sat for a while in the river's side waiting the first floating body, and unfortunately it was my own 😅 If you don't have any suggestion, I'd say to keep things as they are because they work in most of the cases. Let's use this issue as reference to collect problems coming from particular setups like what I described in the beginning.

To conclude, I want to say that an environment fully based on conda-forge packages is not affected by this thanks to the usage of the same compiler. Furthermore, also manylinux* packages in PyPI should not be affected, under the assumption that the same variant is available for both packages.

cc @dic-iit/blf-developers


Edit: other resources:

@traversaro
Copy link
Collaborator

Interesting! However I still do not understand what is the issue here, as both clang and gcc (essential any version after gcc 5) should generate objects with compatible ABI.

@diegoferigo
Copy link
Member Author

It's a mystery also to me, and I suspect that going into the rabbit hole in this case would mean digging quite a lot into pybind11 and Python's capsule mechanism. Not sure I (or anyone else here) has enough interests to start this journey XD

@traversaro
Copy link
Collaborator

It's a mystery also to me, and I suspect that going into the rabbit hole in this case would mean digging quite a lot into pybind11 and Python's capsule mechanism. Not sure I (or anyone else here) has enough interests to start this journey XD

Ack, I would not exclude that the issue is not directly related to the compiler version but also to someting else in the compilation environment (compilation flags?).

@traversaro
Copy link
Collaborator

Just a curiosity, what was the Python version in apt and the one instead used in conda?

@diegoferigo
Copy link
Member Author

diegoferigo commented Aug 3, 2021

It's a mystery also to me, and I suspect that going into the rabbit hole in this case would mean digging quite a lot into pybind11 and Python's capsule mechanism. Not sure I (or anyone else here) has enough interests to start this journey XD

Ack, I would not exclude that the issue is not directly related to the compiler version but also to someting else in the compilation environment (compilation flags?).

I suspected the same, but for the sake of clarity I did not include all the attempts I did. Having the conda environment active both during the docker build process and during the container runtime, the environment should be the same, excluding CC and CXX.

I also tried to build the external blf in a system environment with conda not activated, and just adding the conda's environment root to CMAKE_PREFIX_PATH to allow CMake finding the dependencies, but YARP-related libraries (YARP is a dependency of blf) were complaining with a lot of error like the following:

/conda/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/local/src/robotology-superbuild/build/install/lib/libYARP_os.so.3.4.6: undefined reference to `ACE_INET_Addr::get_host_addr() const'
/conda/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/local/src/robotology-superbuild/build/install/lib/libYARP_sig.so.3.4.6: undefined reference to `png_set_expand_gray_1_2_4_to_8@PNG16_0'
/conda/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/local/src/robotology-superbuild/build/install/lib/libYARP_sig.so.3.4.6: undefined reference to `png_create_write_struct@PNG16_0'

Note: I have no idea from where conda's ld is picked up, maybe from imported targets? The system environment is clean.

Then, I wanted to disable the YARP-dependent targets of blf and try the bindings, but YARP is a required dependency for the Python component. I didn't proceed with the experiments after this.

Just a curiosity, what was the Python version in apt and the one instead used in conda?

~ 
❯ /usr/bin/python3 --version
Python 3.8.10

~ 
❯ /conda/bin/python --version
Python 3.8.10

@artivis
Copy link

artivis commented Aug 30, 2021

Maybe the following SO answer is worth checking:

For pybind11 to treat two classes as the same, not only their name should be equal, but also the file. [lib]'s bindings were compiled with path_to_sources/install/include included, but my bindings were compiled with /usr/local/include included, so Python did not recognize them.

@traversaro
Copy link
Collaborator

Maybe the following SO answer is worth checking:

For pybind11 to treat two classes as the same, not only their name should be equal, but also the file. [lib]'s bindings were compiled with path_to_sources/install/include included, but my bindings were compiled with /usr/local/include included, so Python did not recognize them.

Interesting, this sounds quite problematic for conda where packages are not typically compiled in the same prefix, and the path of the files changes from build to build if you are building with conda-build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants