Add support for 8da4w quantization #884

andrewor14 · 2024-04-26T20:52:28Z

Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization.

Test Plan:

tune run quantize --config quantization \
    quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256

tune run eleuther_eval --config eleuther_eval \
    checkpointer._component_=torchtune.utils.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_files=[hf_model_0001_2-8da4w.pt] \
    quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256

Quantize output:

2024-04-26:13:09:20,852 INFO     [quantize.py:98] Time for quantization: 7.49 sec
2024-04-26:13:09:20,852 INFO     [quantize.py:99] Memory used: 15.30 GB
2024-04-26:13:09:27,625 INFO     [quantize.py:112] Model checkpoint of size 7.08 GB saved to /home/andrewor/logs/tune/saved-4-25/full_1713990537_complete/quantize_output/hf_model_0001_2-8da4w.pt

Eval output:

100%|██████████| 5882/5882 [17:05<00:00,  5.74it/s]
2024-04-26:13:50:47,214 INFO     [eleuther_eval.py:196] Eval completed in 1040.88 seconds.
2024-04-26:13:50:47,214 INFO     [eleuther_eval.py:198] truthfulqa_mc2: {'acc,none': 0.4578906989909618, 'acc_stderr,none': 0.01542454257740726, 'alias': 'truthfulqa_mc2'}

Reviewers: jerryzh168, kartikayk, ebsmothers

Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar

Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization. Test Plan: tune run quantize --config quantization quantizer._component_=torchtune.utils.quantization.Int8DynActInt4WeightQuantizer quantizer.groupsize=256 Reviewers: jerryzh168, kartikayk, ebsmothers Subscribers: jerryzh168, kartikayk, ebsmothers, supriyar

pytorch-bot · 2024-04-26T20:52:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/884

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 56fd119 with merge base 9a9a396 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

rohan-varma · 2024-04-29T02:21:47Z

torchtune/utils/quantization.py

 Quantizer,
 )

 __all__ = [
 "Int4WeightOnlyQuantizer",
 "Int4WeightOnlyGPTQQuantizer",
 "Int8WeightOnlyQuantizer",
+ "Int8DynActInt4WeightQuantizer",


What are the benefits of this int8 dynamic act. quantizer over existing quantizers in torchtune? Do we plan to offer tutorials / additional documentation clarifying to users how to think about which quantizers to use?

This is the only form of quantization supported by QAT right now, so this PR is adding the same form of quantization for PTQ for comparison. As for the documentation of which quantizer to use, I think we can add it in the future. Right now I'm not sure if we have any even in torchao or gpt-fast comparing 8da4w to weight only quantization. cc @jerryzh168 @HDCharles

rohan-varma

Sounds good! Please feel free to land.

andrewor14 · 2024-04-30T22:44:17Z

@rohan-varma @ebsmothers can you help me land? Looks like I don't have write access

ebsmothers · 2024-04-30T22:53:54Z

Done, thanks for adding this!

Summary: pytorch#884 introduced a quantizer that was not available before PyTorch 2.3, causing import errors for users using an earlier version. In torchao, we gate the import by PyTorch version. We can do the same here.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2024

ebsmothers approved these changes Apr 26, 2024

View reviewed changes

rohan-varma reviewed Apr 29, 2024

View reviewed changes

rohan-varma self-requested a review April 30, 2024 21:26

rohan-varma approved these changes Apr 30, 2024

View reviewed changes

ebsmothers merged commit fb59735 into pytorch:main Apr 30, 2024
27 checks passed

andrewor14 mentioned this pull request May 2, 2024

Fix import error from quantization before PyTorch 2.3 #920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 8da4w quantization #884

Add support for 8da4w quantization #884

andrewor14 commented Apr 26, 2024

pytorch-bot bot commented Apr 26, 2024 •

edited

Loading

rohan-varma Apr 29, 2024

andrewor14 Apr 29, 2024

rohan-varma left a comment

andrewor14 commented Apr 30, 2024

ebsmothers commented Apr 30, 2024

Add support for 8da4w quantization #884

Add support for 8da4w quantization #884

Conversation

andrewor14 commented Apr 26, 2024

pytorch-bot bot commented Apr 26, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/884

✅ No Failures

rohan-varma Apr 29, 2024

Choose a reason for hiding this comment

andrewor14 Apr 29, 2024

Choose a reason for hiding this comment

rohan-varma left a comment

Choose a reason for hiding this comment

andrewor14 commented Apr 30, 2024

ebsmothers commented Apr 30, 2024

pytorch-bot bot commented Apr 26, 2024 •

edited

Loading