Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[huggingface] allow creating BPE huggingface tokenizers #2550

Merged

Conversation

larochef
Copy link
Contributor

Description

Brief description of what this PR is about

This PR allows creating huggingface tokenizers using the BPE model. It also adds some entrypoint that can then be used when other models will need to be added.

Related to #2533

I will need some help, since there is some point I must have missed: I have link errors even after building the jni lib.

  • If this change is a backward incompatible change, why must this change be made?
    This is new functionality, the existing apis will continue to work as-is

  • Interesting edge cases to note here

@larochef larochef requested review from zachgk, frankfliu and a team as code owners April 18, 2023 14:26
@larochef larochef force-pushed the bpe-huggingface-tokenizers branch 2 times, most recently from 31de280 to c783242 Compare April 19, 2023 08:47
@frankfliu
Copy link
Contributor

@larochef

Thanks so much for your contribution. I made some changes to address issue in the PR, but I don't have permission to push to your branch. So I pushed to my fork: https:/frankfliu/djl/tree/bpe-huggingface-tokenizers

Can you grant permission to push to your PR?

Or could you please cherry-pick my changes into your branch?

@larochef
Copy link
Contributor Author

I will pick the changes tomorrow and give you acces to the PR at the same time. if it's ok with you, I'll merge these 2 commits to keep a clean history

@larochef
Copy link
Contributor Author

@frankfliu the code has been integrated to the PR

@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2023

Codecov Report

Patch coverage: 72.32% and project coverage change: +1.64 🎉

Comparison is base (bb5073f) 72.08% compared to head (87294b7) 73.73%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2550      +/-   ##
============================================
+ Coverage     72.08%   73.73%   +1.64%     
- Complexity     5126     6944    +1818     
============================================
  Files           473      684     +211     
  Lines         21970    30313    +8343     
  Branches       2351     3146     +795     
============================================
+ Hits          15838    22350    +6512     
- Misses         4925     6430    +1505     
- Partials       1207     1533     +326     
Impacted Files Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java 69.23% <ø> (-4.11%) ⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java 76.00% <ø> (ø)
.../main/java/ai/djl/modality/cv/output/Landmark.java 100.00% <ø> (ø)
...i/djl/modality/cv/translator/BigGANTranslator.java 21.42% <0.00%> (-5.24%) ⬇️
.../modality/cv/translator/ImageFeatureExtractor.java 0.00% <0.00%> (ø)
.../ai/djl/modality/cv/translator/YoloTranslator.java 27.77% <0.00%> (+18.95%) ⬆️
...ain/java/ai/djl/modality/cv/util/NDImageUtils.java 67.10% <0.00%> (+7.89%) ⬆️
api/src/main/java/ai/djl/modality/nlp/Decoder.java 63.63% <ø> (ø)
api/src/main/java/ai/djl/modality/nlp/Encoder.java 66.66% <ø> (ø)
.../main/java/ai/djl/modality/nlp/EncoderDecoder.java 64.00% <ø> (ø)
... and 228 more

... and 352 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@frankfliu frankfliu changed the title [Draft] [huggingface] allow creating BPE huggingface tokenizers [huggingface] allow creating BPE huggingface tokenizers Apr 26, 2023
@frankfliu frankfliu merged commit 86ddc66 into deepjavalibrary:master Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants