Skip to content

stefan-it/hmTEAMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hmTEAMS

🤗

Historical Multilingual and Monolingual TEAMS Models. The following languages are covered:

  • English (British Library Corpus - Books)
  • German (Europeana Newspaper)
  • French (Europeana Newspaper)
  • Finnish (Europeana Newspaper, Digilib)
  • Swedish (Europeana Newspaper, Digilib)
  • Dutch (Delpher Corpus)
  • Norwegian (NCC Corpus)

Architecture

We pretrain a "Training ELECTRA Augmented with Multi-word Selection" (TEAMS) model:

hmTEAMS Overview

Pretraining

We pretrain the hmTEAMS model on a v3-32 TPU Pod. All details can be found here.

Results

We perform experiments on various historic NER datasets, such as HIPE-2022 or ICDAR Europeana. All results incl. hyper-parameters can be found here.

Release

Our pretrained hmTEAMS model can be obtained from the Hugging Face Model Hub:

Fine-tuned Models

We release the following models, trained on various Historic NER Datasets (HIPE-2020, HIPE-2022, ICDAR):

Language Model(s)
English AjMC (HIPE-2022) - TopRes19th (HIPE-2022)
German AjMC (HIPE-2022) - NewsEye - HIPE-2020
French AjMC (HIPE-2022) - ICDAR-Europeana - LeTemps (HIPE-2022) - NewsEye - HIPE-2020
Finnish NewsEye (HIPE-2022)
Swedish NewsEye (HIPE-2022)
Dutch ICDAR-Europeana

Changelog

  • 25.09.2024: All hmTEAMS models are now released under permissive Apache 2.0 license.
  • 08.09.2023: Evaluation on German and French HIPE-2020 datasets added here.
  • 01.09.2023: Evaluation on German and French NewsEye datasets added here.
  • 28.08.2023: Evaluation on TopRes19th dataset added here.
  • 27.08.2023: Evaluation on LeTemps dataset is added here.
  • 06.08.2023: Evaluation on various historic NER datasets are completed. Results can be found here.
  • 01.08.2023: hmTEAMS organization can be found on the Model Hub. More information of how to access trained hmTEAMS models are coming soon.
  • 25.05.2023: Initial version of this repo.

Acknowledgements

We thank Luisa März, Katharina Schmid and Erion Çano for their fruitful discussions about Historical Language Models.

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

About

Historical Multilingual TEAMS Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published