-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO训练时不能使用QLoRA吗? #1185
Comments
可以使用。 |
大佬您好,这是我在弄ppo训练时的参数。 CUDA_VISIBLE_DEVICES=0 python src/train_bash.py |
先使用 export_model.py 把 checkpoint_dir 合并进去,然后使用新的 --model_name_or_path |
你好。出现这个报错是不是意味着rm阶段和ppo阶段都得使用相同的量化等级,例如不能在训练rm时没有使用QLoRA,然后训练PPO时使用。 然后我再想扩展咨询一下就是,使用QLoRA进行sft生成的weight,能和原本的没有量化的模型合并吗? RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. |
|
请问 |
而且这个合并的处理,和在训练或推理时直接加载lora的权重会有区别吗? |
请问这个问题现在有解答吗? |
没有区别 |
PPO训练时不能使用QLoRA吗?
The text was updated successfully, but these errors were encountered: