Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 8da4w quantization #884

Merged
merged 1 commit into from
Apr 30, 2024
Merged

Conversation

andrewor14
Copy link
Contributor

Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization.

Test Plan:

tune run quantize --config quantization \
    quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256

tune run eleuther_eval --config eleuther_eval \
    checkpointer._component_=torchtune.utils.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_files=[hf_model_0001_2-8da4w.pt] \
    quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256

Quantize output:

2024-04-26:13:09:20,852 INFO     [quantize.py:98] Time for quantization: 7.49 sec
2024-04-26:13:09:20,852 INFO     [quantize.py:99] Memory used: 15.30 GB
2024-04-26:13:09:27,625 INFO     [quantize.py:112] Model checkpoint of size 7.08 GB saved to /home/andrewor/logs/tune/saved-4-25/full_1713990537_complete/quantize_output/hf_model_0001_2-8da4w.pt

Eval output:

100%|██████████| 5882/5882 [17:05<00:00,  5.74it/s]
2024-04-26:13:50:47,214 INFO     [eleuther_eval.py:196] Eval completed in 1040.88 seconds.
2024-04-26:13:50:47,214 INFO     [eleuther_eval.py:198] truthfulqa_mc2: {'acc,none': 0.4578906989909618, 'acc_stderr,none': 0.01542454257740726, 'alias': 'truthfulqa_mc2'}

Reviewers: jerryzh168, kartikayk, ebsmothers

Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar

Summary: Add a new quantization for users to quantize their
models using int8 per token dynamic activation + int4 per
axis grouped weight quantization.

Test Plan:
tune run quantize --config quantization quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer quantizer.groupsize=256

Reviewers: jerryzh168, kartikayk, ebsmothers

Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar
Copy link

pytorch-bot bot commented Apr 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/884

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 56fd119 with merge base 9a9a396 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2024
Quantizer,
)

__all__ = [
"Int4WeightOnlyQuantizer",
"Int4WeightOnlyGPTQQuantizer",
"Int8WeightOnlyQuantizer",
"Int8DynActInt4WeightQuantizer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the benefits of this int8 dynamic act. quantizer over existing quantizers in torchtune? Do we plan to offer tutorials / additional documentation clarifying to users how to think about which quantizers to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only form of quantization supported by QAT right now, so this PR is adding the same form of quantization for PTQ for comparison. As for the documentation of which quantizer to use, I think we can add it in the future. Right now I'm not sure if we have any even in torchao or gpt-fast comparing 8da4w to weight only quantization. cc @jerryzh168 @HDCharles

@rohan-varma rohan-varma self-requested a review April 30, 2024 21:26
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Please feel free to land.

@andrewor14
Copy link
Contributor Author

@rohan-varma @ebsmothers can you help me land? Looks like I don't have write access

@ebsmothers
Copy link
Contributor

Done, thanks for adding this!

@ebsmothers ebsmothers merged commit fb59735 into pytorch:main Apr 30, 2024
27 checks passed
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request May 2, 2024
Summary: pytorch#884 introduced a quantizer that was not available before
PyTorch 2.3, causing import errors for users using an earlier
version. In torchao, we gate the import by PyTorch version.
We can do the same here.
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request May 2, 2024
Summary: pytorch#884 introduced a quantizer that was not available before
PyTorch 2.3, causing import errors for users using an earlier
version. In torchao, we gate the import by PyTorch version.
We can do the same here.
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request May 2, 2024
Summary: pytorch#884 introduced a quantizer that was not available before
PyTorch 2.3, causing import errors for users using an earlier
version. In torchao, we gate the import by PyTorch version.
We can do the same here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants