-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackPACK with simple attention and additional layers #326
Comments
No a clean solution, but aiming for the minimum amount of code for it to work. For the The custom parameters like For example instead of
you could do
|
I want to use backpack for computing per-sample gradients and was trying to understand the challenges of using a custom model that uses pytorch nn layers. For example, something like this architecture: https:/codingchild2424/MonaCoBERT/blob/master/src/models/monacobert.py
Some of the basic layers used for computing attention:
self.query = nn.Linear(hidden_size, self.all_head_size, bias=False) # 512 -> 256
self.key = nn.Linear(hidden_size, self.all_head_size, bias=False) # 512 -> 256
self.value = nn.Linear(hidden_size, self.all_head_size, bias=False)
The model also has a trainable nn.Parameter:
self.gammas = nn.Parameter(torch.zeros(self.num_attention_heads, 1, 1))
And some convolutional layers.
What could be some of the challenges I might face while using a model like that and potential solutions to them? Is LayerNorm supported yet?
The text was updated successfully, but these errors were encountered: