Add support for converting Curated Transformer configs to Hugging Face compatible configs #333
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR follows up on #332, adding support for bidirectional conversions of model configs between Curated Transformers and Hugging Face. This is facilitated by the following classes:
HFConfigKey
- Descriptor for a HF model config key. Defines how its mapped to the CT config.HFConfigKeyDefault
- Wrapper around a default value for a HF config key that allows for the generic handling of optional keys.HFSpecificConfig
- A set of hardcoded keys that are required to be part of every HF model config. Overridden on a per-model basis and merged into the final config dictionary.CommonHFKeys
andCommonCuratedToHFConverters
- Shared config key descriptors and conversion functions.The
FromHFHub
mix has been expanded to provide methods for the conversion of the configs.All currently supported models except Falcon support bidirectional config conversions implicitly. The conversion of the Falcon model config is more complicated as we support loading from two different model implementations -
RefinedWebModel
andFalcon
. That and the complicatednew_decoder_architecture
situation makes it difficult to allow the full range of conversion. Since the latter implementation is now in the mainlinetransformers
branch, we'll do the following:Checklist