Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting Curated Transformer configs to Hugging Face compatible configs #333

Merged

Conversation

shadeMe
Copy link
Collaborator

@shadeMe shadeMe commented Sep 27, 2023

Description

This PR follows up on #332, adding support for bidirectional conversions of model configs between Curated Transformers and Hugging Face. This is facilitated by the following classes:

  • HFConfigKey - Descriptor for a HF model config key. Defines how its mapped to the CT config.
  • HFConfigKeyDefault - Wrapper around a default value for a HF config key that allows for the generic handling of optional keys.
  • HFSpecificConfig - A set of hardcoded keys that are required to be part of every HF model config. Overridden on a per-model basis and merged into the final config dictionary.
  • CommonHFKeys and CommonCuratedToHFConverters - Shared config key descriptors and conversion functions.

The FromHFHub mix has been expanded to provide methods for the conversion of the configs.

All currently supported models except Falcon support bidirectional config conversions implicitly. The conversion of the Falcon model config is more complicated as we support loading from two different model implementations - RefinedWebModel and Falcon. That and the complicated new_decoder_architecture situation makes it difficult to allow the full range of conversion. Since the latter implementation is now in the mainline transformers branch, we'll do the following:

  • Conversion from both RWM and Falcon HF architectures is fully supported.
  • The CT Falcon config will only be converted to the mainline Falcon HF config/architecture.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.

@shadeMe shadeMe added type/feature Type: Feature feat/serde Feature: Serialization/Deserialization labels Sep 27, 2023
@shadeMe shadeMe force-pushed the feature/reversible-hf-config-conversion branch from 386321f to a9cc82b Compare September 29, 2023 11:40
@shadeMe shadeMe added this to the v2.0.0 milestone Oct 19, 2023
Copy link

@rmitsch rmitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a superficial pass through the code, I don't want to keep you too long (and I won't be able to appreciate the details at this point anyway due to my lack of experience with curated-transformers).

I'll finish the review tomorrow.

Copy link

@rmitsch rmitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned I can't really give valuable feedback here, but this seems alright to me and seems to be mostly a refactoring PR anyway. LGTM.

curated_transformers/models/mpt/causal_lm.py Show resolved Hide resolved
@shadeMe shadeMe merged commit fa492b2 into explosion:main Nov 7, 2023
9 checks passed
@shadeMe shadeMe deleted the feature/reversible-hf-config-conversion branch November 7, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/serde Feature: Serialization/Deserialization type/feature Type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants