Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds Long-LoRA support for GPTNeoX models.
Tested on a colab A100 40GB x 1 instance, with the scripts
fine-tune.py
supervised-fine-tune.py
Using a sample GPTNeoX model
EleutherAI/pythia-1.4b-deduped
As there was no specific guide on how to contribute, I've tried to make as little modification as possible to the original structure.
Added GPTNeoX support by adding a module
gptneox_attn_replace
just as the originalllama_attn_replace
.How to apply
Application is showcased in the tested scripts
fine-tune.py
,supervised-fine-tune.py
Add
model_type
argument to switch back and forth between thellama
andgpt-neox
configuration.Notes on flash-attention + GPTNeoX
modeling_gpt_neox.py
, for theuse_flash_attn=True
case.transformers == 4.33.3
as of writing.flash_attn_varlen_func
would cause a runtime error of "in-place operation" flash-attention codeflash_attn_varlen_qkvpacked_func
which worked fine.