Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got an Error while retraining model using ./train.sh #41

Open
christoph-meyer-horsch opened this issue Feb 10, 2021 · 1 comment
Open

Comments

@christoph-meyer-horsch
Copy link

Hi,
I am trying to retrain the given model with a new dataset for my thesis. Preprocessing worked fine but now I get the following error when trying to run train.sh:

neg_target = target.new_tensor(target).masked_fill_(target_label, self.padding_idx)
RuntimeError: The expanded size of the tensor (384) must match the existing size (832) at non-singleton dimension 0. Target sizes: [384]. Tensor sizes: [832]

I didn't change anything besides the pretrained model path in train.sh.

I previously fixed this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

by changing q *= self.scaling to q = q * self.scaling in line 109 of the multihead_attention.py of fairseq.

Thank you.

@Hexa4C
Copy link

Hexa4C commented Apr 8, 2021

Hi,
I am trying to retrain the given model with a new dataset for my thesis. Preprocessing worked fine but now I get the following error when trying to run train.sh:

neg_target = target.new_tensor(target).masked_fill_(target_label, self.padding_idx)
RuntimeError: The expanded size of the tensor (384) must match the existing size (832) at non-singleton dimension 0. Target sizes: [384]. Tensor sizes: [832]

I didn't change anything besides the pretrained model path in train.sh.

I previously fixed this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

by changing q *= self.scaling to q = q * self.scaling in line 109 of the multihead_attention.py of fairseq.

Thank you.

Thank you. This issue really saved me lots of time .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants