When will vLLM support LORA for GPTQ models? #3542

Answered by jeejeelee

SMAntony asked this question in Q&A

SMAntony
Mar 21, 2024

Recently when I tried to use LORAs with GPTQ model, vLLM raised an error saying that it does not support this feature yet. The "yet" in the sentence gives me hope that this feature request is already in their roadmap. Is it possible to know an estimation of when this feature will be implemented?

Answered by jeejeelee

See: https:/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https:/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py

View full answer

Replies: 2 comments 1 reply

frei-x
Sep 25, 2024

Is there any progress on this issue?

1 reply

SMAntony Oct 2, 2024
Author

I have not tested it but as per #4776 and the files provided by @jeejeelee . GPTQ Loras should be supported now.

jeejeelee
Sep 26, 2024

See: https:/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https:/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py

0 replies

Answer selected by SMAntony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment