Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use scale=1.0 in floats_tensor called in speech model testers #17007

Merged
merged 1 commit into from
Apr 29, 2022

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 29, 2022

What does this PR do?

Fix the failure of Speech2TextModelTest.test_pt_tf_model_equivalence. This is caused by

input_features = floats_tensor(
[self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size
)

where the input_features get a large magnitude of 1e2 (from self.vocab_size=99).

(probably this happens because we just copied the input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) from NLP models?)

I changed it to scale=1.0, but need @patrickvonplaten's expertise to make sure there was no particular reason to use self.vocab_size.

Details

Current speech model testers have

def prepare_config_and_inputs(self):
    input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)

The self.vocab_size argument is the scale, so the generated dummy input_values has the magnitude of self.vocab_size.
For Speech2TextModelTester, we have vocab_size=99.

Furthermore, Speech2TextEncoder has

self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

and from the tester's hidden_size=16, we get embed_scale=4.

The input_features goes through the conv layer(s) and being scaled:

inputs_embeds = self.conv(input_features)
inputs_embeds = self.embed_scale * inputs_embeds

On CPU however, the conv layers of PT/TF gives diff. with a magnitude of 1e-7 for input values with 1s. So with the above 2 scalings, this error becomes 4e-5, and the PT/TF equiv. test fails.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 29, 2022

The documentation is not available anymore as the PR was closed or merged.

@@ -116,7 +116,7 @@ def __init__(
self.adapter_output_seq_length = (self.output_seq_length - 1) // adapter_stride + 1

def prepare_config_and_inputs(self):
input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)
input_values = floats_tensor([self.batch_size, self.seq_length], scale=1.0)
Copy link
Contributor

@patrickvonplaten patrickvonplaten Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow good catch!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're 100% right - this was indeed a bad copy paste!

@patrickvonplaten
Copy link
Contributor

Thanks for fixing all the tests!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! Thanks a lot!

@ydshieh ydshieh merged commit e952e04 into huggingface:main Apr 29, 2022
@ydshieh ydshieh deleted the fix_speech_to_text_ci_failure branch April 29, 2022 12:41
stevhliu pushed a commit to stevhliu/transformers that referenced this pull request May 3, 2022
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants