Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO训练时不能使用QLoRA吗? #1185

Closed
hzho2000 opened this issue Oct 14, 2023 · 9 comments
Closed

PPO训练时不能使用QLoRA吗? #1185

hzho2000 opened this issue Oct 14, 2023 · 9 comments
Labels
solved This problem has been already solved

Comments

@hzho2000
Copy link

PPO训练时不能使用QLoRA吗?

@hiyouga
Copy link
Owner

hiyouga commented Oct 14, 2023

可以使用。

@hiyouga hiyouga added the solved This problem has been already solved label Oct 14, 2023
@hiyouga hiyouga closed this as completed Oct 14, 2023
@hzho2000
Copy link
Author

大佬您好,这是我在弄ppo训练时的参数。
不进行QLoRA可以正常使用,但是在我加了--quantization_bit 8这一行后就会报错:
ValueError: Quantized model cannot create new LoRA weight. Merge them first.

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
--stage ppo
--model_name_or_path C:\model\chatglm2-6b-32k
--do_train
--dataset trainset
--template chatglm2
--finetuning_type lora
--lora_target query_key_value
--resume_lora_training False
--checkpoint_dir result/medical-lora-v1
--reward_model result/medical-rm-v1
--output_dir result/medical-ppo-v1
--per_device_train_batch_size 2
--gradient_accumulation_steps 2
--cutoff_len 2048
--max_new_tokens 2048
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-5
--num_train_epochs 1.0
--fp16
--plot_loss

@hiyouga
Copy link
Owner

hiyouga commented Oct 14, 2023

先使用 export_model.py 把 checkpoint_dir 合并进去,然后使用新的 --model_name_or_path

@hzho2000
Copy link
Author

hzho2000 commented Oct 14, 2023

你好。出现这个报错是不是意味着rm阶段和ppo阶段都得使用相同的量化等级,例如不能在训练rm时没有使用QLoRA,然后训练PPO时使用。

然后我再想扩展咨询一下就是,使用QLoRA进行sft生成的weight,能和原本的没有量化的模型合并吗?

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

@hiyouga
Copy link
Owner

hiyouga commented Oct 14, 2023

  1. PPO 仅支持 6B 模型,不支持 32k 模型
  2. 可以

@Wolverhampton0
Copy link

  1. PPO 仅支持 6B 模型,不支持 32k 模型
  2. 可以

请问
为什么之前版本做ppo训练+qlora时不用合并,为什么多了这个合并操作,是之前的方式有什么问题吗?

@Wolverhampton0
Copy link

而且这个合并的处理,和在训练或推理时直接加载lora的权重会有区别吗?

@lindsey-chang
Copy link

而且这个合并的处理,和在训练或推理时直接加载lora的权重会有区别吗?

请问这个问题现在有解答吗?

@hiyouga
Copy link
Owner

hiyouga commented Oct 26, 2023

没有区别

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

4 participants