Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DoRA] add dora #8098

Closed
wants to merge 4 commits into from
Closed

[DoRA] add dora #8098

wants to merge 4 commits into from

Conversation

JunnYu
Copy link
Member

@JunnYu JunnYu commented Mar 12, 2024

PR types

New features

PR changes

APIs

Description

image

  • Paper: https://arxiv.org/abs/2402.09353
  • DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding
  • DoRA hyperparameters settings
    [NOTE that] 💡 While fine-tuning with DoRA by utilizing the configuration of LoRA can already achieve better results most of the time, achieving optimal performance compared to LoRA still requires adjustments to the hyperparameters.
    We suggest starting with a slightly lower learning rate than that of LoRA, and users may also experiment with varying lora dropout ratios.
    User may also start with half of the rank of the LoRA configuration which oftentime can already results in comparable or even superior accuracy compared to that of LoRA.
  • Usage:
    python finetune_generation.py ./llama/lora_argument.json --use_dora True

Reference:

Copy link

paddle-bot bot commented Mar 12, 2024

Thanks for your contribution!

Copy link

codecov bot commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 40.00000% with 24 lines in your changes missing coverage. Please review.

Project coverage is 55.40%. Comparing base (a0457d1) to head (d6c54b0).
Report is 371 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/peft/lora/lora_layers.py 34.28% 23 Missing ⚠️
paddlenlp/peft/lora/lora_model.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8098      +/-   ##
===========================================
- Coverage    55.41%   55.40%   -0.01%     
===========================================
  Files          597      597              
  Lines        91594    91630      +36     
===========================================
+ Hits         50754    50766      +12     
- Misses       40840    40864      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gongel gongel requested a review from lugimzzz March 13, 2024 05:27
dora_weight = self.weight + self.lora_A @ self.lora_B * self.scaling
weight_norm = dora_weight.norm(p=2, axis=0, keepdim=True).detach()
mag_norm_scale = (self.lora_magnitude / weight_norm).unsqueeze(0)
result_dora = (mag_norm_scale - 1) * (input @ self.weight) + mag_norm_scale * (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里能复用吗?mag_norm_scale * input @ dora_weight - input @ self.weight(这里从result里面取)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有dropout,所以结果不一样,只有当dropout=0的时候可以复用

Copy link

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants