Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The paper mentions that an ensemble model achieved the state of the art results however there is no mention of how the seperate models were trained #33

Open
Chhokra opened this issue Mar 26, 2020 · 2 comments

Comments

@Chhokra
Copy link

Chhokra commented Mar 26, 2020

The readme only mentions of a training one single model if I'm not wrong. How to go about training 4 models as mentioned by the results table of your paper?

@Chhokra Chhokra changed the title The paper mentions that an ensemble model achieved the state of the art results however there is no mention of how the seperate models were traiend The paper mentions that an ensemble model achieved the state of the art results however there is no mention of how the seperate models were trained Mar 26, 2020
@kevin-parnow
Copy link

I was also looking into this and came to the conclusion that they likely just used 4 different random initializations as was done in (Chollampatt, Ng, 2018), a paper they reference.

You can look at Table 1 and its footnote to see their gains from ensembling and that just different random initializations is what they use.
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/17308/16137

Note I am not affiliated with either paper's authors, and this is just speculation though.

@zhawe01
Copy link
Member

zhawe01 commented Dec 8, 2020

Thanks @kevbp5. We did use 4 different random initializations for the models without DA.
For the models with DA, we also used pre-trained checkpoints from different pre-training stages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants