Add `init fill-config-transformer` CLI command #16

shadeMe · 2023-08-07T12:59:06Z

Description

This command reads the Hugging Face model name and revision from the initialize.components.transformer.encoder_loader config section, fetches its config from the HF Model Hub and fills in the entry point parameters for the same.

Types of change

New feature

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

This command reads the Hugging Face model name and revision from the `initialize.components.transformer.encoder_loader` config section, fetches its config and fills in the entry point parameters for the same.

.github/workflows/test.yml

This reverts commit 59b78d9.

spacy_curated_transformers/cli/fill_config_transformer.py

danieldk

LGTM, feel free to merge if @adrianeboyd is also ok with it.

.github/workflows/test.yml

spacy_curated_transformers/tests/test_cli_app.py

adrianeboyd · 2023-08-10T08:34:04Z

I could run the CLI command for the prefilled config snippets, but there's still no way for users to get to an initial config:

nlp.add_pipe("curated_transformer")

or

spacy init config -p curated_transformer

both fail with:

Config validation error
curated_transformer.model -> vocab_size	field required
{'@architectures': 'spacy-curated-transformers.XlmrTransformer.v1', 'piece_encoder': {'@architectures': 'spacy-curated-transformers.XlmrSentencepieceEncoder.v1'}, 'with_spans': {'@architectures': 'spacy-curated-transformers.WithStridedSpans.v1'}}

spacy_curated_transformers/cli/fill_config_transformer.py

Co-authored-by: Adriane Boyd <[email protected]>

shadeMe · 2023-08-10T11:02:34Z

I could run the CLI command for the prefilled config snippets, but there's still no way for users to get to an initial config:

Fixed in #20.

adrianeboyd · 2023-08-14T09:25:05Z

Can you tell the user based on the downloaded model which kind of architecture they probably need? (Like, I was even confused by the naming of something that looks like bert but wasn't.)

shadeMe · 2023-08-14T09:59:44Z

I've expanded the error message to display either the expected HF model type for the given curated transformers architecture (in case the type of the HF model is not supported) or the correct curated transformers architecture (if the HF model type is supported).

svlandeg

(still reviewing - first few comments)

spacy_curated_transformers/cli/fill_config_transformer.py

svlandeg · 2023-08-23T14:24:57Z

I don't like the way that fill doesn't mean quite the same thing as in fill-config, but the alternatives are probably worse.

I was actually thinking the same earlier. What about update instead of fill? 🤔

spacy_curated_transformers/cli/fill_config_transformer.py

Co-authored-by: Sofie Van Landeghem <[email protected]>

…spacy-curated-transformers into feature/config-fill-cli-command

shadeMe · 2023-08-28T16:35:02Z

I was actually thinking the same earlier. What about update instead of fill? 🤔

I'd still keep fill as the grand majority of the fields being added are actually missing in the config, i.e., there are no keys for them (as opposed to updating the values of existing keys).

svlandeg

Looks great!

* Add `init fill-config-transformer` CLI command This command reads the Hugging Face model name and revision from the `initialize.components.transformer.encoder_loader` config section, fetches its config and fills in the entry point parameters for the same. * Feature-gate tests * Lazy import `huggingface_hub` * Rethrow exception when `CliRunner` fails * Fix type * Revert slow marker * Print error when HF tokenizer loading fails * Tick `transformers` version * Install `sentencepiece` in CI * Install `sentencepiece` as a `transformers` extra dependency * Temporarily rethrow exception to debug CI * Revert "Temporarily rethrow exception to debug CI" This reverts commit 59b78d9. * Fix website link in docstring * Set default output path to `stdout` * Update command arg helpstring * Replace `IntEnum` with `Enum` * Apply suggestions from code review Co-authored-by: Adriane Boyd <[email protected]> * Fix typo * Set `model_max_length` to `sys.maxsize` * Automatically fill in piece encoder loader * Assert expected outputs in unit tests * Add args to pass model name/revision via CLI This overrides the name/revision in the config (if present). * Apply suggestions from code review Co-authored-by: Adriane Boyd <[email protected]> * Use `main` as the default CLI revision arg * Mention the model name/revision CLI args in error message * Use `int32.max` as the sentinel value for `model_max_length` * Add back `model_max_length` to unit tests * Add clarification to the mismatching architectures error message * Rename command to `fill-curated-transformer` * Update tests * Clarify the mismatching model type/architecture error message further * `isort` * Shorten docstring for display in CLI * Sneaky readMe fix * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Shorten docstring further * Remove duplicate key * Pretty-print errors when fetching model config from HF Hub * Restructure error handling when validating model type/arch * Add example model names for supported architectures * Sort fetchd parameter list --------- Co-authored-by: Adriane Boyd <[email protected]> Co-authored-by: Sofie Van Landeghem <[email protected]>

Add init fill-config-transformer CLI command

d1574e4

This command reads the Hugging Face model name and revision from the `initialize.components.transformer.encoder_loader` config section, fetches its config and fills in the entry point parameters for the same.

shadeMe added the enhancement New feature or request label Aug 7, 2023

Feature-gate tests

bd8a9e7

shadeMe mentioned this pull request Aug 7, 2023

Documentation for spacy-curated-transformers explosion/spaCy#12677

Merged

3 tasks

shadeMe added 7 commits August 7, 2023 17:34

Lazy import huggingface_hub

d5a94b5

Rethrow exception when CliRunner fails

030e7df

Fix type

145766a

Revert slow marker

b6991ae

Print error when HF tokenizer loading fails

03164f5

Tick transformers version

da32882

Install sentencepiece in CI

9b5e729

adrianeboyd reviewed Aug 8, 2023

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

shadeMe added 6 commits August 8, 2023 11:06

Install sentencepiece as a transformers extra dependency

6d4d09b

Temporarily rethrow exception to debug CI

59b78d9

Revert "Temporarily rethrow exception to debug CI"

3672c67

This reverts commit 59b78d9.

Fix website link in docstring

ec1c6e1

Set default output path to stdout

2f71547

Update command arg helpstring

e400f39

danieldk reviewed Aug 8, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

shadeMe added 2 commits August 8, 2023 14:37

Replace IntEnum with Enum

030f8c0

Merge branch 'main' into feature/config-fill-cli-command

e35a477

danieldk approved these changes Aug 9, 2023

View reviewed changes

adrianeboyd reviewed Aug 10, 2023

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

adrianeboyd reviewed Aug 10, 2023

View reviewed changes

spacy_curated_transformers/tests/test_cli_app.py Show resolved Hide resolved

adrianeboyd reviewed Aug 10, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

adrianeboyd reviewed Aug 10, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

shadeMe and others added 2 commits August 10, 2023 12:20

Apply suggestions from code review

f4ad3f4

Co-authored-by: Adriane Boyd <[email protected]>

Fix typo

e38d27a

shadeMe added 3 commits August 11, 2023 18:11

Add clarification to the mismatching architectures error message

142c158

Rename command to fill-curated-transformer

7b9e2b3

Update tests

eb6de88

shadeMe added 2 commits August 14, 2023 11:57

Clarify the mismatching model type/architecture error message further

b7b5981

isort

d2788e4

svlandeg reviewed Aug 22, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Show resolved Hide resolved

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

shadeMe added 2 commits August 23, 2023 11:16

Shorten docstring for display in CLI

2b20124

Sneaky readMe fix

eebba5f

svlandeg reviewed Aug 23, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

svlandeg reviewed Aug 23, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

spacy_curated_transformers/cli/fill_config_transformer.py Show resolved Hide resolved

svlandeg reviewed Aug 23, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Outdated Show resolved Hide resolved

svlandeg reviewed Aug 23, 2023

View reviewed changes

spacy_curated_transformers/cli/fill_config_transformer.py Show resolved Hide resolved

shadeMe and others added 4 commits August 24, 2023 20:17

Apply suggestions from code review

e62353c

Co-authored-by: Sofie Van Landeghem <[email protected]>

Shorten docstring further

8979ef2

Remove duplicate key

69474a5

Merge branch 'feature/config-fill-cli-command' of github.com:shadeMe/…

1851178

…spacy-curated-transformers into feature/config-fill-cli-command

shadeMe added 5 commits August 28, 2023 18:40

Merge branch 'main' into feature/config-fill-cli-command

91b38fa

Pretty-print errors when fetching model config from HF Hub

bba2b69

Restructure error handling when validating model type/arch

f687eb9

Add example model names for supported architectures

1fc33fc

Sort fetchd parameter list

482ced4

svlandeg approved these changes Aug 30, 2023

View reviewed changes

svlandeg merged commit 159a4e9 into explosion:main Aug 30, 2023
7 checks passed

shadeMe deleted the feature/config-fill-cli-command branch August 30, 2023 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `init fill-config-transformer` CLI command #16

Add `init fill-config-transformer` CLI command #16

shadeMe commented Aug 7, 2023

danieldk left a comment

adrianeboyd commented Aug 10, 2023

shadeMe commented Aug 10, 2023

adrianeboyd commented Aug 14, 2023

shadeMe commented Aug 14, 2023

svlandeg left a comment

svlandeg commented Aug 23, 2023

shadeMe commented Aug 28, 2023

svlandeg left a comment

Add init fill-config-transformer CLI command #16

Add init fill-config-transformer CLI command #16

Conversation

shadeMe commented Aug 7, 2023

Description

Types of change

Checklist

danieldk left a comment

Choose a reason for hiding this comment

adrianeboyd commented Aug 10, 2023

shadeMe commented Aug 10, 2023

adrianeboyd commented Aug 14, 2023

shadeMe commented Aug 14, 2023

svlandeg left a comment

Choose a reason for hiding this comment

svlandeg commented Aug 23, 2023

shadeMe commented Aug 28, 2023

svlandeg left a comment

Choose a reason for hiding this comment

Add `init fill-config-transformer` CLI command #16

Add `init fill-config-transformer` CLI command #16