Got an Error while retraining model using ./train.sh #41

christoph-meyer-horsch · 2021-02-10T12:39:09Z

Hi,
I am trying to retrain the given model with a new dataset for my thesis. Preprocessing worked fine but now I get the following error when trying to run train.sh:

neg_target = target.new_tensor(target).masked_fill_(target_label, self.padding_idx)
RuntimeError: The expanded size of the tensor (384) must match the existing size (832) at non-singleton dimension 0. Target sizes: [384]. Tensor sizes: [832]

I didn't change anything besides the pretrained model path in train.sh.

I previously fixed this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

by changing q *= self.scaling to q = q * self.scaling in line 109 of the multihead_attention.py of fairseq.

Thank you.

Hexa4C · 2021-04-08T09:48:50Z

Hi,
I am trying to retrain the given model with a new dataset for my thesis. Preprocessing worked fine but now I get the following error when trying to run train.sh:

neg_target = target.new_tensor(target).masked_fill_(target_label, self.padding_idx)
RuntimeError: The expanded size of the tensor (384) must match the existing size (832) at non-singleton dimension 0. Target sizes: [384]. Tensor sizes: [832]

I didn't change anything besides the pretrained model path in train.sh.

I previously fixed this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

by changing q *= self.scaling to q = q * self.scaling in line 109 of the multihead_attention.py of fairseq.

Thank you.

Thank you. This issue really saved me lots of time .

asartipi13 mentioned this issue Dec 4, 2022

bash train.sh got error #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got an Error while retraining model using ./train.sh #41

Got an Error while retraining model using ./train.sh #41

christoph-meyer-horsch commented Feb 10, 2021

Hexa4C commented Apr 8, 2021

Got an Error while retraining model using ./train.sh #41

Got an Error while retraining model using ./train.sh #41

Comments

christoph-meyer-horsch commented Feb 10, 2021

Hexa4C commented Apr 8, 2021