Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Retrain transformer-based models in allennlp_models.pretrained #4457

Closed
5 tasks done
epwalsh opened this issue Jul 9, 2020 · 7 comments
Closed
5 tasks done

Retrain transformer-based models in allennlp_models.pretrained #4457

epwalsh opened this issue Jul 9, 2020 · 7 comments
Assignees
Labels
Models Issues related to the allennlp-models repo
Milestone

Comments

@epwalsh
Copy link
Member

epwalsh commented Jul 9, 2020

Updates from the new transformers/tokenizers release has broken some of these.

@epwalsh epwalsh added the Models Issues related to the allennlp-models repo label Jul 9, 2020
@epwalsh epwalsh added this to the 1.1 milestone Jul 9, 2020
@dirkgr
Copy link
Member

dirkgr commented Jul 14, 2020

I started doing BERT SRL too because I already had the environment for that spun up.

@dirkgr
Copy link
Member

dirkgr commented Jul 17, 2020

@AkshitaB, when you touch the RoBERTa SST model, try it without the cls_is_last_token setting. It's more correct to not have it.

@matt-gardner
Copy link
Contributor

I see BERT SRL checked here, but there's this issue that says that performance actually doesn't match, with a solution that seems to fix it: #4392 (comment). Have you checked performance against the original reported performance? Seems like a simple config file fix to get performance back up, if this is an issue.

I was just looking at the SRL and BERT SRL models, though, and I think we probably can just combine them at this point. I don't think we gain much by using the srl_bert.py code at this point, and it's hard-coded for an earlier version of transformers that makes it so you can't easily update it for roberta, which would surely be better. But, that should be a separate issue.

@dirkgr
Copy link
Member

dirkgr commented Aug 3, 2020

The srl_bert.py code is simpler though. It's a much more transformer-native implementation. If we can get that one to perform as well as srl.py, I might argue for keeping srl_bert.py and removing the other one.

@matt-gardner
Copy link
Contributor

"transformer-native" means "I have to use transformers, and I can't even try using anything else." It kind of goes against the whole point of the abstractions that we have.

@dirkgr
Copy link
Member

dirkgr commented Aug 3, 2020

While that's true, it's hard to argue in print for a solution that's more complicated and different from the standard if it doesn't also improve results.

In principle the srl_bert.py approach should work with any token embedder, but it might not do as well as a custom architecture.

@dirkgr
Copy link
Member

dirkgr commented Aug 24, 2020

We're tracking the remaining issue in #4521.

@dirkgr dirkgr closed this as completed Aug 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Models Issues related to the allennlp-models repo
Projects
None yet
Development

No branches or pull requests

4 participants