-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when passing encoder_outputs as tuple to EncoderDecoder models #15536
Comments
Hey @jsnfly, Regarding the first point - agree, it'd be good to check if the input is a tuple and if it is we can wrap it into a Regarding the 2nd point - that's very interesting (cc @sanchit-gandhi). Also makes a lot of sense since ASR by itself is monotonic so knowing the order of words to transcribe together with the encoder speech frames seems like a sensible design architecture. Thanks a lot for sharing this here! |
The embedding hack is a really neat find - nice one @jsnfly! It's something we're going to take a look into in our ASR experiments! It seems like it could help with alignment in a much cleaner and more compact way than the encoder-decoder cross-attention mechanism. |
I have opened one - feel free to take a look.
Thanks for your feedback :) I will also try to experiment with this a bit more and let you know if I get some more results. |
@jsnfly , thank you for this PR... Is it possible to do this fix for a T5 model as well.. It is also a sequence to sequence model and sometime we may want to pass a tuple to the decoder. If you guys don't see any issue I can do that. For context, I am playing with the Fusion In Decoder, which is a version of the T5 model. The encoder, a tuple which is the hidden state of all encoder blocks concatenated as one vector, but the code is failing because it is expecting a tuple. I am going to apply this fix to the T5 model locally and see how it behaves.. @patrickvonplaten, let me know what you think .. |
@espoirMur |
Thanks for your response and it helps a lot~ |
Thanks! helps me a lot!! |
Environment info
transformers
version: 4.17.0.dev0Who can help
@patrickvonplaten
Information
In EncoderDecoder models one can pass
encoder_outputs
as a tuple of Tensors . However, if you do that this line will fail withsince the tuple isn't modified in the
forward
method.So if it is a tuple,
encoder_outputs
could maybe wrapped in aModelOutput
class or something similar. Or handle the tuple somehow explicitly.On a slight tangent
I made a
SpeechEncoderDecoderModel
for the robust speech challenge: https://huggingface.co/jsnfly/wav2vec2-large-xlsr-53-german-gpt2. I found that adding the position embeddings of the decoder model to the outputs of the encoder model improved performance significantly (basically didn't work without it).This needs small modifications to the
__init__
andforward
methods of theSpeechEncoderDecoderModel
.At the moment this seems to me too much of a "hack" to add it to the
SpeechEncoderDecoderModel
class generally (for example via a flag), because it may differ for differentdecoder
models and probably also needs more verification. @patrickvonplaten showed some interest that this could be included in Transformers nonetheless. What do you think?The text was updated successfully, but these errors were encountered: