Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community Event] Doc Tests Sprint #16292

Open
patrickvonplaten opened this issue Mar 21, 2022 · 104 comments
Open

[Community Event] Doc Tests Sprint #16292

patrickvonplaten opened this issue Mar 21, 2022 · 104 comments

Comments

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Mar 21, 2022

This issue is part of our Doc Test Sprint. If you're interested in helping out come join us on Discord and talk with other contributors!

Docstring examples are often the first point of contact when trying out a new library! So far we haven't done a very good job at ensuring that all docstring examples work correctly in 🤗 Transformers - but we're now very dedicated to ensure that all documentation examples work correctly by testing each documentation example via Python's doctest (https://docs.python.org/3/library/doctest.html) on a daily basis.

In short we should do the following for all models for both PyTorch and Tensorflow:

    • Check the current doc examples will run without failure
    • Add an expected output to the doc example and test it via Python's doc test (see Guide to contributing below)

Adding a documentation test for a model is a great way to better understand how the model works, a simple (possibly first) contribution to Transformers and most importantly a very important contribution to the Transformers community 🔥

If you're interested in adding a documentation test, please read through the Guide to contributing below.

This issue is a call for contributors, to make sure docstring exmaples of existing model architectures work correctly. If you wish to contribute, reply in this thread which architectures you'd like to take :)

Guide to contributing:

  1. Ensure you've read our contributing guidelines 📜

  2. Claim your architecture(s) in this thread (confirm no one is working on it) 🎯

  3. Implement the changes as in add doctests for bart like seq2seq models #15987 (see the diff on the model architectures for a few examples) 💪

    In addition, there are a few things we can also improve, for example :

    • Fix some style issues: for example, change ``decoder_input_ids``` to `decoder_input_ids`.
    • Using a small model checkpoint instead of a large one: for example, change "facebook/bart-large" to "facebook/bart-base" (and adjust the expected outputs if any)
  4. Open the PR and tag me @patrickvonplaten @ydshieh or @patil-suraj (don't forget to run make fixup before your final commit) 🎊

    • Note that some code is copied across our codebase. If you see a line like # Copied from transformers.models.bert..., this means that the code is copied from that source, and our scripts will automatically keep that in sync. If you see that, you should not edit the copied method! Instead, edit the original method it's copied from, and run make fixup to synchronize that across all the copies. Be sure you installed the development dependencies with pip install -e ".[dev]", as described in the contributor guidelines above, to ensure that the code quality tools in make fixup can run.

PyTorch Model Examples added to tests:

Tensorflow Model Examples added to tests:

  • ALBERT (@vumichien)
  • BART
  • BEiT
  • BERT (@vumichien)
  • Bert
  • BigBird (@vumichien)
  • BigBirdPegasus
  • Blenderbot
  • BlenderbotSmall
  • CamemBERT
  • Canine
  • CLIP (@Aanisha)
  • ConvBERT (@simonzli)
  • ConvNext
  • CTRL
  • Data2VecAudio
  • Data2VecText
  • DeBERTa
  • DeBERTa-v2
  • DeiT
  • DETR
  • DistilBERT (@jmwoloso)
  • DPR
  • ELECTRA (@bhadreshpsavani)
  • Encoder
  • FairSeq
  • FlauBERT
  • FNet
  • Funnel
  • GPT2 (@cakiki)
  • GPT-J (@cakiki)
  • Hubert
  • I-BERT
  • ImageGPT
  • LayoutLM
  • LayoutLMv2
  • LED
  • Longformer (@KMFODA)
  • LUKE
  • LXMERT
  • M2M100
  • Marian
  • MaskFormer (@reichenbch)
  • mBART
  • MegatronBert
  • MobileBERT (@vumichien)
  • MPNet
  • mT5
  • Nystromformer
  • OpenAI
  • OpenAI
  • Pegasus
  • Perceiver
  • PLBart
  • PoolFormer
  • ProphetNet
  • QDQBert
  • RAG
  • Realm
  • Reformer
  • ResNet
  • RemBERT
  • RetriBERT
  • RoBERTa (@patrickvonplaten)
  • RoFormer
  • SegFormer
  • SEW
  • SEW-D
  • SpeechEncoderDecoder
  • Speech2Text
  • Speech2Text2
  • Splinter
  • SqueezeBERT
  • Swin (@johko)
  • T5 (@MarkusSagen)
  • TAPAS
  • Transformer-XL (@simonzli)
  • TrOCR (@arnaudstiegler)
  • UniSpeech
  • UniSpeechSat
  • Van
  • ViLT
  • VisionEncoderDecoder
  • VisionTextDualEncoder
  • VisualBert
  • ViT (@johko)
  • ViTMAE
  • Wav2Vec2
  • WavLM
  • XGLM
  • XLM
  • XLM-RoBERTa (@AbinayaM02)
  • XLM-RoBERTa-XL
  • XLMProphetNet
  • XLNet
  • YOSO
@patrickvonplaten patrickvonplaten changed the title Doc tests sprint [Community Event] Doc Tests Sprint Mar 21, 2022
@patrickvonplaten patrickvonplaten pinned this issue Mar 21, 2022
@reichenbch
Copy link
Contributor

@patrickvonplaten I would like to start with Maskformer for Tensorflow/Pytorch. Catch up with how the event goes.

@patrickvonplaten
Copy link
Contributor Author

Awesome! Let me know if you have any questions :-)

@KMFODA
Copy link
Contributor

KMFODA commented Mar 21, 2022

Hello! I'd like to take on Longformer for Tensorflow/Pytorch please.

@MarkusSagen
Copy link
Contributor

@patrickvonplaten I would like to start with T5 for pytorch and tensorflow

@patrickvonplaten
Copy link
Contributor Author

Sounds great!

@patrickvonplaten
Copy link
Contributor Author

LayoutLM is also taken as mentioned by a contributor on Discord!

@cakiki
Copy link
Contributor

cakiki commented Mar 22, 2022

@patrickvonplaten I would take GPT and GPT-J (TensorFlow editions) if those are still available.

I'm guessing GPT is GPT2?

@vumichien
Copy link
Contributor

I will take Bert, Albert, and Bigbird for both Tensorflow/Pytorch

@johko
Copy link
Contributor

johko commented Mar 22, 2022

I'll take Swin and ViT for Tensorflow

@jmwoloso
Copy link
Contributor

I'd like DistilBERT for both TF and PT please

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 22, 2022

@patrickvonplaten I would take GPT and GPT-J (TensorFlow editions) if those are still available.

I'm guessing GPT is GPT2?

@cakiki You can go for GPT2 (I updated the name in the test)

@ArEnSc
Copy link
Contributor

ArEnSc commented Mar 23, 2022

Can I try GPT2 and GPTJ for Pytorch? if @ydshieh you are not doing so?

@Aanisha
Copy link

Aanisha commented Mar 23, 2022

I would like to try CLIP for Tensorflow and PyTorch.

@NielsRogge
Copy link
Contributor

I'll take CANINE and TAPAS.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 23, 2022

Can I try GPT2 and GPTJ for Pytorch? if @ydshieh you are not doing so?

@ArEnSc
No, you can work on these 2 models :-) Thank you!

@vumichien
Copy link
Contributor

@ydshieh Since the MobileBertForSequenceClassification is the copy of BertForSequenceClassification, so I think I will do check doc-test of MobileBert as well to overcome the error from make fixup

@abdouaziz
Copy link

I'll take FlauBERT and CamemBERT.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 23, 2022

@abdouaziz Awesome! Do you plan to work on both PyTorch and TensorFlow versions, or only one of them?

@Tegzes
Copy link
Contributor

Tegzes commented Mar 23, 2022

I would like to work on LUKE model for both TF and PT

@NielsRogge
Copy link
Contributor

@Tegzes you're lucky because there's no LUKE in TF ;) the list above actually just duplicates all models, but many models aren't available yet in TF.

@Tegzes
Copy link
Contributor

Tegzes commented Mar 23, 2022

In this case, I will also take DeBERTa and DeBERTa-v2 for PyTorch

@abdouaziz
Copy link

abdouaziz commented Mar 23, 2022

@ydshieh

I plan to work only with PyTorch

@patrickvonplaten
Copy link
Contributor Author

@Tegzes you're lucky because there's no LUKE in TF ;) the list above actually just duplicates all models, but many models aren't available yet in TF.

True - sorry I've been lazy at creating this list!

@arnaudstiegler
Copy link
Contributor

Happy to work on TrOCR (pytorch and TF)

@patrickvonplaten
Copy link
Contributor Author

I take RoBERTa in PT and TF

@AbinayaM02
Copy link

I would like to pick up XLM-RoBERTa in PT and TF.

@bhadreshpsavani
Copy link
Contributor

bhadreshpsavani commented Mar 23, 2022

I can work on ELECTRA for PT and TF

@stevenmanton
Copy link
Contributor

I'll work on perceiver.

@RP2025
Copy link

RP2025 commented Oct 4, 2022

hello, I would love to contribute to encoder for PT and TF
Thankyou @sgugger

@SauravMaheshkar
Copy link
Contributor

I'd like to try Reformer for PyTorch and Tensorflow ☕

@soma2000-lang
Copy link
Contributor

soma2000-lang commented Oct 10, 2022

I would to try for Data2VecText @patrickvonplaten

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 11, 2022

@SauravMaheshkar It looks PyTorch Reformer is already done, see here

Or do you mean docs/source/en/model_doc/reformer.mdx?

@traveler-pinkie
Copy link

@patrickvonplaten I would like to work on Marian for TensorFlow please. Thank You

@RamitPahwa
Copy link
Contributor

I would like to work on OpenAI for Pytorch and Tensorflow @ydshieh

@traveler-pinkie traveler-pinkie mentioned this issue Oct 15, 2022
5 tasks
@soma2000-lang
Copy link
Contributor

@ydshieh I am working on clip model

@Tegzes Tegzes mentioned this issue Nov 18, 2022
5 tasks
@vedikajain2004
Copy link

@patrickvonplaten I'd like to try working on the CamemBERT model for TensorFlow

@BarnikRB
Copy link

BarnikRB commented Oct 2, 2023

@patrickvonplaten I'd like to work on ImageGPT for pytorch

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2023

The list in this thread is one year old and outdated, as well as some guidelines. I will have to make some update.

@imsoumya18
Copy link

@patrickvonplaten I want to work on FNet

@CodeGovindz
Copy link

could you please assign this issue to me

@asarthaks
Copy link

@ydshieh Can I work on the CLIP model for PT, if no one is working on it.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 15, 2023

Hi @asarthaks Thank your for the interest on this.

The list is outdated, and CLIP might likely no longer require the changes.

If you find anything in it needs an update, go ahead :-)

@Epik-Whale463
Copy link

Want to work on the RAG , can you assign it to me

@0xSaurabhx
Copy link

I would like to work on the TensorFlow models for CamemBERT and Canine.

@b423016
Copy link

b423016 commented Oct 2, 2024

I would love to work on ImageGPT . Please assign it to me

@AnyigorTobias
Copy link

Hello @ydshieh, I will like to work on distilbert.
Can you assign that to me?

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 4, 2024

Hi all. @Epik-Whale463 @0xSaurabhx @b423016 @AnyigorTobias

This sprint was 2 years old and the instructions may no longer be valid. I will try to check the status.

@saipavanmeruga
Copy link

Hello @ydshieh, I am looking for a good first issue to contribute. This is a gentle ping to see if the sprint is still active.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.