-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing CUDA libraries when using bitsandbyes with Dockerfile #1923
Comments
@tobrien6 one of the solutions in the bitandbytes issue (linked in the error message) suggests adding a symlink for If that doesn't help, try posting your workflow in bitsandbytes-foundation/bitsandbytes#85 |
I was able to fix this by changing to a version of the nvidia docker image (pulled in at the top of the diffusers dockerfile) for 11.6, not 11.7. Now I'm getting an error "ValueError: Attempting to unscale FP16 gradients." which I see discussed elsewhere. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Describe the bug
I am trying to run diffusers in Docker. I am using the diffusers-pytorch-cuda Dockerfile provided in the repo. The only change I made is to add diffusers and bitsandbytes to the pip install list in the Dockerfile.
When I enable --use_8bit_adam for Dreambooth training, I get this error:
000000000000000000000 BUG REPORT 000000000000000000000000
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https:/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
000000000000000000000
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
ERROR: /opt/venv/bin/python3: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See bitsandbytes-foundation/bitsandbytes#85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /opt/venv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 828, in
main(args)
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 642, in main
lr_scheduler = get_scheduler(
TypeError: get_scheduler() got an unexpected keyword argument 'num_cycles'
Traceback (most recent call last):
File "/opt/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/venv/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/venv/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/opt/venv/lib/python3.8/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
I tried adding this line to the Dockerfile to get it to point to the CUDA library, but it did not work:
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.7/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH"
EDIT: I realized that specific file libcudart.so does not exist anywhere in the filesystem of the docker container. The path I was adding only contains a libcudart.so.11.0 which is found in the path it was looking in already. Not sure what I should do here.
Reproduction
Described above
Logs
No response
System Info
Using Debian 10 with a 16GB T4, 4 cores and 15GB system RAM on google compute engine
The text was updated successfully, but these errors were encountered: