PPO训练时不能使用QLoRA吗？ #1185

hzho2000 · 2023-10-14T12:14:53Z

PPO训练时不能使用QLoRA吗？

hiyouga · 2023-10-14T12:15:38Z

可以使用。

hzho2000 · 2023-10-14T12:26:13Z

大佬您好，这是我在弄ppo训练时的参数。
不进行QLoRA可以正常使用，但是在我加了--quantization_bit 8这一行后就会报错：
ValueError: Quantized model cannot create new LoRA weight. Merge them first.

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
--stage ppo
--model_name_or_path C:\model\chatglm2-6b-32k
--do_train
--dataset trainset
--template chatglm2
--finetuning_type lora
--lora_target query_key_value
--resume_lora_training False
--checkpoint_dir result/medical-lora-v1
--reward_model result/medical-rm-v1
--output_dir result/medical-ppo-v1
--per_device_train_batch_size 2
--gradient_accumulation_steps 2
--cutoff_len 2048
--max_new_tokens 2048
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-5
--num_train_epochs 1.0
--fp16
--plot_loss

hiyouga · 2023-10-14T12:28:00Z

先使用 export_model.py 把 checkpoint_dir 合并进去，然后使用新的 --model_name_or_path

hzho2000 · 2023-10-14T13:10:29Z

你好。出现这个报错是不是意味着rm阶段和ppo阶段都得使用相同的量化等级，例如不能在训练rm时没有使用QLoRA，然后训练PPO时使用。

然后我再想扩展咨询一下就是，使用QLoRA进行sft生成的weight，能和原本的没有量化的模型合并吗？

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

hiyouga · 2023-10-14T13:26:33Z

PPO 仅支持 6B 模型，不支持 32k 模型
可以

Wolverhampton0 · 2023-10-23T03:13:31Z

PPO 仅支持 6B 模型，不支持 32k 模型

可以

请问
为什么之前版本做ppo训练＋qlora时不用合并，为什么多了这个合并操作，是之前的方式有什么问题吗？

Wolverhampton0 · 2023-10-23T03:14:52Z

而且这个合并的处理，和在训练或推理时直接加载lora的权重会有区别吗?

lindsey-chang · 2023-10-26T12:35:19Z

而且这个合并的处理，和在训练或推理时直接加载lora的权重会有区别吗?

请问这个问题现在有解答吗？

hiyouga · 2023-10-26T12:48:57Z

没有区别

hiyouga added the solved This problem has been already solved label Oct 14, 2023

hiyouga closed this as completed Oct 14, 2023

hiyouga mentioned this issue Oct 19, 2023

新版本做rm训练时报错quantized model cannot create new lora weight .Merge them first #1225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO训练时不能使用QLoRA吗？ #1185

PPO训练时不能使用QLoRA吗？ #1185

hzho2000 commented Oct 14, 2023

hiyouga commented Oct 14, 2023

hzho2000 commented Oct 14, 2023

hiyouga commented Oct 14, 2023

hzho2000 commented Oct 14, 2023 •

edited

Loading

hiyouga commented Oct 14, 2023

Wolverhampton0 commented Oct 23, 2023

Wolverhampton0 commented Oct 23, 2023

lindsey-chang commented Oct 26, 2023

hiyouga commented Oct 26, 2023

PPO训练时不能使用QLoRA吗？ #1185

PPO训练时不能使用QLoRA吗？ #1185

Comments

hzho2000 commented Oct 14, 2023

hiyouga commented Oct 14, 2023

hzho2000 commented Oct 14, 2023

hiyouga commented Oct 14, 2023

hzho2000 commented Oct 14, 2023 • edited Loading

hiyouga commented Oct 14, 2023

Wolverhampton0 commented Oct 23, 2023

Wolverhampton0 commented Oct 23, 2023

lindsey-chang commented Oct 26, 2023

hiyouga commented Oct 26, 2023

hzho2000 commented Oct 14, 2023 •

edited

Loading