Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE bump libomp in wheels to libomp-13.0.0_0 for macos [cd build] #22105

Closed

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Dec 31, 2021

This should fix the nightly builds since libomp-11.0.1_0 is no longer available on the macports repository.

Instead of using the oldest libomp versions, I tried to use the latest but I am not 100% sure if it will work or not. Let's see what the CI tells us.

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

This seems to work but there was an unrelated error in the win_amd64 / Python 3.9 build:

error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\cibw\python\python.3.9.9\tools\libs /LIBPATH:C:\cibw\python\python.3.9.9\tools\PCbuild\amd64 /LIBPATH:build\temp.win-amd64-3.9 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\lib\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\lib\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\ucrt\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\um\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\lib\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\lib\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\ucrt\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\um\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\lib\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\lib\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\ucrt\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\um\x64 /EXPORT:PyInit__gradient_boosting build\temp.win-amd64-3.9\Release\sklearn\ensemble\_gradient_boosting.obj /OUT:build\lib.win-amd64-3.9\sklearn\ensemble\_gradient_boosting.cp39-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.9\Release\sklearn\ensemble\_gradient_boosting.cp39-win_amd64.lib /openmp" failed with exit status 1158
    Building wheel for scikit-learn (pyproject.toml): finished with status 'error'

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

This confirms that the windows link.exe failure is random... Maybe not all windows executors are configured the same way on github actions?

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

All the tests pass with libomp-13.0.0_0 but I fuzzily recall that we faced an unresolved bug with recent versions of llvm runtime libraries but I am not sure this was related to libomp. I cannot find the issue in the tracker. @jeremiedbb do you see what I mean by any chance?

@jeremiedbb
Copy link
Member

We downgraded from libomp 12 to libomp 11 in #21227
According to the comments in the PR it would better to first check locally that using libomp 13 works with the configurations listed here #21227 (comment)

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

I built a wheel with cibuildwheel and libomp-13 locally and I can reproduce the segfault when running the following /tmp/repro.py script in a conda env with scipy from conda-forge that comes with openblas linked to llvm-openmp 12:

from sklearn.linear_model import Lasso

then with lldb python /tmp/repro.py followed by r yields:

Process 3210 resuming
Process 3210 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x00000001015e915c libomp.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 44
libomp.dylib`__kmp_suspend_64<false, true>:
->  0x1015e915c <+44>: ldr    x20, [x8, w0, sxtw #3]
    0x1015e9160 <+48>: mov    x0, x20
    0x1015e9164 <+52>: bl     0x1015e8964               ; __kmp_suspend_initialize_thread
    0x1015e9168 <+56>: add    x19, x20, #0x4c0          ; =0x4c0 
Target 0: (python) stopped.

which is not exactly the same instructions as #21182 (comment) but also a EXC_BAD_ACCESS in __kmp_suspend_64.

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

I also tried to run with the LIBOMP_USE_HIDDEN_HELPER_TASK=0 LIBOMP_NUM_HIDDEN_HELPER_THREADS=0 env as suggested in https://bugs.llvm.org/show_bug.cgi?id=50579#c1 but that does not change anything (I still get the segfault with the same lldb backtrace).

@ogrisel
Copy link
Member Author

ogrisel commented Dec 31, 2021

I guess we will have to:

  • build our own libomp 11 binary and put it somewhere as a github release artifact to build our wheels as a stopgap solution to be able to release macos wheels for scikit-learn 1.1.
  • build libomp for source with KMP_DEBUG=1 to get more info or even use lldb with the source folder to debug the root cause and report a minimal reproducer or a fix to report upsteam as I am not sure it's related to https://bugs.llvm.org/show_bug.cgi?id=50579 since setting the env variables does not make it possible to workaround the segfault.

@ogrisel ogrisel changed the title MAINT bump libomp in wheels to libomp-13.0.0_0 for macos [cd build] DO NOT MERGE bump libomp in wheels to libomp-13.0.0_0 for macos [cd build] Dec 31, 2021
@thomasjpfan
Copy link
Member

build our own libomp 11 binary and put it somewhere as a github release artifact to build our wheels as a stopgap solution to be able to release macos wheels for scikit-learn 1.1.

I think this is the best way forward for the time being. I'll set it up.

@thomasjpfan
Copy link
Member

thomasjpfan commented Dec 31, 2021

I created a simple libomp-osx-builds repo that builds libomp locally and package it up the same way macports does. I do not know if there is a way to build the 10.13 binary without running a system on 10.13 itself.

An alternative is to use https://mac.r-project.org/openmp/ and use 11.0.1 for arm64 and 10.0.0 for everything else.

@thomasjpfan
Copy link
Member

I opened an alternative solution at: #22109 which uses conda-forge's libomp 11.0.1

@ogrisel ogrisel closed this Jan 2, 2022
@ogrisel ogrisel deleted the fix-macos-arm64-libomp-13.0.0_0 branch January 3, 2022 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants