Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Could I make a Pull Request for a new feature(i.e: A Model Class?】自定义的Lora变体模型实现unified checkpoint存储 #9226

Closed
WhuanY opened this issue Oct 3, 2024 · 9 comments
Assignees

Comments

@WhuanY
Copy link

WhuanY commented Oct 3, 2024

Feature request

本人希望能够在llm/finetuning.py的功能模块中增加对其他Lora变体模型类(如LoKr, LoHa)的微调实现和模型存储。

Motivation

🎯需求:使用paddle的框架实现Lora的变体: LoKr
⌚️进展:目前已经写好了基本的LoKrModel类和Finetunning.py的修改脚本。
❓问题:模型存储不支持自定义的Lora变体类。报错信息和核心代码如下:

  File "/home/wyh/anaconda3/envs/venv0/lib/python3.12/site-packages/paddlenlp/trainer/trainer.py", line 2535, in _save
    save_unified_checkpoint(self.args, self.model, self.optimizer, output_dir, safe_serialization=True)
  File "/home/wyh/anaconda3/envs/venv0/lib/python3.12/site-packages/paddlenlp/trainer/plugins/unified_checkpoint.py", line 139, in save_unified_checkpoint
    raise ValueError("Unified checkpoint only supports PretrainedModel, LoRAModel and PrefixModelForCausalLM!")
ValueError: Unified checkpoint only supports PretrainedModel, LoRAModel and PrefixModelForCausalLM!

在llm/run_finetune.py我已经实现了LoKr变体模型的训练:

def main():
    # skip some codes...
    # main logic of lokr code
    if model_args.lokr:
        if model_args.lokr_path is None:
            target_modules = get_lora_target_modules(model) # required for modification
            lokr_config = LoKrConfig(
                target_modules=target_modules,
                decompose_factor = model_args.decompose_factor,
                lora_dim=model_args.lora_dim_in_lokr,
                decompose_both = model_args.decompose_both,
                lokr_alpha = model_args.lokr_alpha,
                merge_weights=False,
                dtype=dtype,
                base_model_name_or_path=model_args.model_name_or_path,
                )
            model = LoKrModel(model, lokr_config) # Here using self-defined LoKrModel in paddlepaddle
        else:
            model = LoKrModel.from_pretrained(
                model=model, 
                lokr_path=model_args.lokr_path)  
            
        model.print_trainable_parameters()

Your contribution

If possible, I would make a PR of contributing to LoKr Model Implementation.

@WhuanY WhuanY changed the title 自定义的Lora变体模型实现unified checkpoint存储 【Could I make a Pull Request for new feature?】自定义的Lora变体模型实现unified checkpoint存储 Oct 8, 2024
@WhuanY WhuanY changed the title 【Could I make a Pull Request for new feature?】自定义的Lora变体模型实现unified checkpoint存储 【Could I make a Pull Request for a new feature(i.e: A Model Class?】自定义的Lora变体模型实现unified checkpoint存储 Oct 8, 2024
@greycooker
Copy link
Contributor

greycooker commented Oct 8, 2024

目前PaddleNLP针对LoRA及其变体(如rsLoRA、PiSSA等)的实现全部使用了LoRAModel这个class,暂时不支持自定义模型类的保存和加载机制。

@WhuanY
Copy link
Author

WhuanY commented Oct 8, 2024

如果希望能够提交新PR,支持新的Lora变体的加载、训练、保存,是否可以提交新自定义模型类

@greycooker
Copy link
Contributor

greycooker commented Oct 9, 2024

可以的,不过还是建议优先使用LoRAModel

@WhuanY
Copy link
Author

WhuanY commented Oct 9, 2024

请问有无较成功的,已经提交新Lora及其变体的可参考的good commit case,以供我这里的PR参考

@greycooker
Copy link
Contributor

greycooker commented Oct 9, 2024

如果一定要新实现LoKrModel的话,暂时还没有PR可以参考,因为之前的LoRA变体都是基于LoRAModel进行实现的,比如Pissa和Dora
#8098
#8250

@WhuanY
Copy link
Author

WhuanY commented Oct 10, 2024

如果一定要新实现LoKrModel的话,暂时还没有PR可以参考,因为之前的LoRA变体都是基于LoRAModel进行实现的,比如Pissa和Dora #8098 #8250

你好!我对我实现的LoKrModel的背景资料做了更多调研。准确地说,目前LoKrModel是Lora变体,其本质也是一个新的Adapter,实现结构希望能够参考Vera和Prefix。

实现新Adapter角度来说,应该可以写一个新Model。希望得到支持😊

@DesmonDay
Copy link
Contributor

DesmonDay commented Oct 12, 2024

重写LoKrModel类需要考虑比较多东西,例如LoRAModel当中支持的分布式策略,所以尽量能复用原来结构就复用原来结构,如果实在不行,可以考虑将LoKrModel相关的实现提交PR,我们这边先整体看看。unified checkpoint存储这个可以暂时不考虑,使用 --unified_checkpoint 0 关闭也可以正常保存checkpoint(只是是不同的格式),如有必要再提供支持即可。

@WhuanY
Copy link
Author

WhuanY commented Oct 13, 2024

重写LoKrModel类需要考虑比较多东西,例如LoRAModel当中支持的分布式策略,所以尽量能复用原来结构就复用原来结构,如果实在不行,可以考虑将LoKrModel相关的实现提交PR,我们这边先整体看看。unified checkpoint存储这个可以暂时不考虑,使用 --unified_checkpoint 0 关闭也可以正常保存checkpoint(只是是不同的格式),如有必要再提供支持即可。

目前我这里的私下测试进度是单卡可以跑通llama的训练。unified_checkpoint我也修改了一些。不然等我整理一下代码结构再提交PR给您们整体查看~

@WhuanY
Copy link
Author

WhuanY commented Oct 16, 2024

Currently made the first pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants