-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 8da4w quantization #884
Conversation
Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization. Test Plan: tune run quantize --config quantization quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer quantizer.groupsize=256 Reviewers: jerryzh168, kartikayk, ebsmothers Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/884
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 56fd119 with merge base 9a9a396 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Quantizer, | ||
) | ||
|
||
__all__ = [ | ||
"Int4WeightOnlyQuantizer", | ||
"Int4WeightOnlyGPTQQuantizer", | ||
"Int8WeightOnlyQuantizer", | ||
"Int8DynActInt4WeightQuantizer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the benefits of this int8 dynamic act. quantizer over existing quantizers in torchtune? Do we plan to offer tutorials / additional documentation clarifying to users how to think about which quantizers to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only form of quantization supported by QAT right now, so this PR is adding the same form of quantization for PTQ for comparison. As for the documentation of which quantizer to use, I think we can add it in the future. Right now I'm not sure if we have any even in torchao or gpt-fast comparing 8da4w to weight only quantization. cc @jerryzh168 @HDCharles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Please feel free to land.
@rohan-varma @ebsmothers can you help me land? Looks like I don't have write access |
Done, thanks for adding this! |
Summary: pytorch#884 introduced a quantizer that was not available before PyTorch 2.3, causing import errors for users using an earlier version. In torchao, we gate the import by PyTorch version. We can do the same here.
Summary: pytorch#884 introduced a quantizer that was not available before PyTorch 2.3, causing import errors for users using an earlier version. In torchao, we gate the import by PyTorch version. We can do the same here.
Summary: pytorch#884 introduced a quantizer that was not available before PyTorch 2.3, causing import errors for users using an earlier version. In torchao, we gate the import by PyTorch version. We can do the same here.
Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization.
Test Plan:
Quantize output:
Eval output:
Reviewers: jerryzh168, kartikayk, ebsmothers
Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar