Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU QLoRA? #844

Closed
cuichenx opened this issue Apr 23, 2024 · 7 comments
Closed

Multi-GPU QLoRA? #844

cuichenx opened this issue Apr 23, 2024 · 7 comments

Comments

@cuichenx
Copy link

Hi, first of all thanks for the great tutorials on lora and qlora! I was able to follow them very easily.
I was wondering if multi-gpu QLoRA is supported? I couldn't find a config file in the repo, and when I tried using the multi-gpu LoRA recipe and adding model.quantize_base=True, I get this error:

ValueError: The module has CPU parameters or buffers when `sync_module_states=True`, which requires them to be on GPU. Please specify the `device_id` argument or move the module to GPU before passing it to FSDP.

I was wondering if multi-gpu QLoRA is supported currently, or if it is on the roadmap? Thanks a lot!

@joecummings
Copy link
Contributor

Hey @cuichenx - glad you found the tutorials useful!

Currently, multi-GPU FSDP + QLoRA is not supported in torchtune, but this is something we are actively working on. Turns out it's a non-trivial combination. See this blog post from the folks over at answer.ai for some more information.

cc: @rohan-varma

@cuichenx
Copy link
Author

Thanks for the fast response! Looking forward to it :)

@kartikayk
Copy link
Contributor

@cuichenx I'd be curious to learn more about your use case. Are you looking at QLoRA instead of LoRA because of memory constraints? Or something else? My impression has been that LoRA gives a higher quality model though at slightly more memory usage. Wondering if you've tried LoRA and if this has not worked on your setup? Thanks for taking a look at torchtune! :)

@cuichenx
Copy link
Author

Hi @kartikayk, I'm currently doing some exploratory studies on QLoRA vs LoRA, so I was looking for a more apples-to-apples comparison because LoRA for a larger model like 34B or 70B would need multiple GPUs. But for now I can do my studies on the smaller models.
Thanks for making this awesome framework!

@kartikayk
Copy link
Contributor

@cuichenx sounds awesome! We'll make sure to comment on here as soon as we have this up and running!

@rohan-varma
Copy link
Member

Thanks for trying out QLoRA @cuichenx and glad to hear that the tutorial is helpful!

Re: LoRA vs QLoRA, as per the tutorial and enablement PR (#478), in my experience we're actually able to get pretty good convergence w/QLoRA and match LoRA for some eval tasks, with about 50% memory savings. As mentioned though, we don't yet have the multi-GPU support and are working on support for this.

@RdoubleA
Copy link
Contributor

This was recently added in #909 and is currently available as an experimental feature in our latest stable version. Closing as completed for now, please reopen if you run into any issues using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants