-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully quantize Fairseq transformer #1993
Conversation
This pull request was exported from Phabricator. Differential Revision: D20967830 |
Summary: Pull Request resolved: facebookresearch#1993 F.linear -> nn.Linear so FBGEMM backend could quantize the linear projection. We observed 3x+ speedup. Backward compatibility code is added to upgrade_state_dict_named. Locally it worked. Testing loading OSS checkpoints. Differential Revision: D20967830 fbshipit-source-id: b00abab4e40facc52ccf1af6b3f830c036071bce
This pull request was exported from Phabricator. Differential Revision: D20967830 |
Nicely done! This should resolve #1943, too! |
Looks like there might be some backwards compatibility issues. I tried loading the wmt14.en-fr.joined-dict.transformer pretrained model: My test script:
And the error:
|
It looks like there's at least an issue here -- specifically I think the property is
Edit: the original diff --git a/fairseq/models/transformer.py b/fairseq/models/transformer.py
index 98c4ab5..6b74129 100644
--- a/fairseq/models/transformer.py
+++ b/fairseq/models/transformer.py
@@ -683,11 +683,11 @@ class TransformerDecoder(FairseqIncrementalDecoder):
if self.share_input_output_embed:
self.output_projection = nn.Linear(
- self.embed_tokens.weight.shape[1], self.embed_tokens.weight.shape[0]
+ self.embed_tokens.weight.shape[1], self.embed_tokens.weight.shape[0], bias=False
)
else:
self.output_projection = nn.Linear(
- self.output_embed_dim, len(dictionary)
+ self.output_embed_dim, len(dictionary), bias=False
)
def build_decoder_layer(self, args, no_encoder_attn=False):
@@ -891,7 +891,7 @@ class TransformerDecoder(FairseqIncrementalDecoder):
"{}.embed_positions._float_tensor".format(name)
] = torch.FloatTensor(1)
- embed_tokens_weights_key = f"{name}.embed_tokens.weights"
+ embed_tokens_weights_key = f"{name}.embed_tokens.weight"
embed_out_key = f"{name}.embed_out"
if embed_tokens_weights_key in state_dict:
state_dict[f"{name}.output_projection.weight"] = state_dict[ |
Thanks @erip ! I will address it very soon. |
Summary: Pull Request resolved: facebookresearch#1993 F.linear -> nn.Linear so FBGEMM backend could quantize the linear projection. We observed 3x+ speedup. Add backward compatibility code. Reviewed By: jhcross Differential Revision: D20967830 fbshipit-source-id: 5a4b4c41f9c46fc06a05c50f57c249e8fcd7b1c8
This pull request was exported from Phabricator. Differential Revision: D20967830 |
This pull request has been merged in 6379573. |
#1190) Summary: The main changes are in fairseq_incremental_decoder.py. I made the base `reorder_incremental_state` implementation a no-op and instead we expect callers (e.g., SequenceGenerator) to call `reorder_incremental_state_scripting`. Pull Request resolved: fairinternal/fairseq-py#1190 Test Plan: I ran unit tests both in PyTorch 1.5 and nightly (1.6). I also tested some of the pretrained translation models, but it'd be good to test with some prod runs. Reviewed By: jhcross Differential Revision: D22095614 Pulled By: myleott fbshipit-source-id: 484b8d47b4feda4efe52233a3d46a207d0816766
Summary: Pull Request resolved: facebookresearch#1993 F.linear -> nn.Linear so FBGEMM backend could quantize the linear projection. We observed 3x+ speedup. Add backward compatibility code. Reviewed By: jhcross Differential Revision: D20967830 fbshipit-source-id: 11d2c98dd5c1965691d6df433e8428499c9c4dc0
…acebookresearch#2032) Summary: This reverts commit 6379573. It doesn't tie weights and breaks old checkpoints. Pull Request resolved: facebookresearch#2032 Reviewed By: cndn, ngoyal2707 Differential Revision: D21141945 Pulled By: myleott fbshipit-source-id: b2f2ce8092a1bf8bcd6a7e422a69306e342b8cdd
Summary: Pull Request resolved: facebookresearch/fairseq#1993 F.linear -> nn.Linear so FBGEMM backend could quantize the linear projection. We observed 3x+ speedup. Add backward compatibility code. Reviewed By: jhcross Differential Revision: D20967830 fbshipit-source-id: 11d2c98dd5c1965691d6df433e8428499c9c4dc0
Summary:
F.linear -> nn.Linear so FBGEMM backend could quantize the linear projection. We observed 3x+ speedup.
Backward compatibility code is added to upgrade_state_dict_named. Locally it worked.
Testing loading OSS checkpoints.
Differential Revision: D20967830