-
Recently when I tried to use LORAs with GPTQ model, vLLM raised an error saying that it does not support this feature yet. The "yet" in the sentence gives me hope that this feature request is already in their roadmap. Is it possible to know an estimation of when this feature will be implemented? |
Beta Was this translation helpful? Give feedback.
Answered by
jeejeelee
Sep 26, 2024
Replies: 2 comments 1 reply
-
Is there any progress on this issue? |
Beta Was this translation helpful? Give feedback.
1 reply
-
See: https:/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https:/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
SMAntony
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See: https:/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https:/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py