Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HubertRobustTest PT/TF equivalence test on GPU #16943

Merged
merged 1 commit into from
Apr 27, 2022

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 26, 2022

What does this PR do?

Fix HubertRobustTest PT/TF equivalence test on GPU.

Note that HubertRobustModelTest has

    def setUp(self):
        self.model_tester = HubertModelTester(
            self, conv_stride=(3, 3, 3), feat_extract_norm="layer", do_stable_layer_norm=True
        )

but get_config() had no do_stable_layer_norm=self.do_stable_layer_norm

To investigate further

  • Why no issue on CPU even without this PR
  • Why using conv_stride=(4, 4, 4) (the default value) has no issue on GPU, even without this PR

(Does this suggest we have PT/TF Hubert behave differently with do_stable_layer_norm=False on GPU when conv_stride=(3, 3, 3) etc?)

@patrickvonplaten You might have some idea about these points ..?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 26, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh requested a review from anton-l April 26, 2022 10:42
@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 26, 2022

Think it might be a great idea to add a check to avoid such situation occur in the future. Will do it in another PR.

Copy link
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why different combinations of conv_stride and do_stable_layer_norm introduce instabilities.

  • conv_stride affects the length of the transformer in the sequence axis (larger strides = stronger subsampling of the input sequence = shorter model)
  • do_stable_layer_norm affects the position of LayerNorm in the transformer layer (either before or after self-attention)

So this definitely needs further investigation

@ydshieh ydshieh merged commit 49d5bcb into huggingface:main Apr 27, 2022
@ydshieh ydshieh deleted the fix_hubert_test_on_gpu branch April 27, 2022 08:50
chamidullinr pushed a commit to chamidullinr/transformers that referenced this pull request Apr 28, 2022
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants