Anyone try fine-tuning 13B model? #28

gururise · 2023-03-16T18:21:25Z

Training the 7B model takes about 18GB of RAM.

I tried training the 13B model, and ran out of VRAM on my 24GB card. I suspect, will need at least 32GB of VRAM.

Has anyone else been successful with fine-tuning a 13B model?

tloen · 2023-03-16T19:27:01Z

@ItsLogic what's your experience been like?

ItsLogic · 2023-03-16T20:17:20Z

It has been quite good although understandably slow to train. I have a 4090 so a 24GB card is enough. The only thing you need to change to make it work is MICRO_BATCH_SIZE which I have set to 2.
The whole time training I was within a few hundred MB of an OOM so you might need to close a few background tasks when you decide to train.
Training time is ~10 hours for the full three epochs. I trained a single epoch(406 steps) in 3 hours 15 mins and got these results on 13B:

13B with lora

13B normal

Just a heads up the provided export_state_dict_checkpoint.py has the parameters set for 7B so you will need to change those to match the 13B params before you can use it.
It also only outputs one file at the end but the llama to HF conversion script works fine as long as you change the 13B shard count to 1 if you plan on using transformers.

baleksey · 2023-03-16T21:36:07Z

@ItsLogic Great results! Can you share your way to make it as a chat with memorized history? Do you include the whole chat history to "### Input:" field with the next prompt or what is the idea? It would be great if you can share your chat logic code!

Btw, how big chat history it can handle with decent memorization and ability to recover info from previous messages?

ItsLogic · 2023-03-16T21:43:39Z

@ItsLogic Great results! Can you share your way to make it as a chat with memorized history? Do you include the whole chat history to "### Input:" field with the next prompt or what is the idea? It would be great if you can share your chat logic code!

I'm just using https:/oobabooga/text-generation-webui with the --cai-chat launch arg as my GUI and it handles everything

Btw, how big chat history it can handle with decent memorization and ability to recover info from previous messages?

Honestly not sure. Haven't had any super long conversations so I can't speak for it's memory.

shawnanastasio · 2023-03-16T22:05:40Z

I'm just using https:/oobabooga/text-generation-webui with the --cai-chat launch arg as my GUI and it handles everything

It's my understanding that you have to format your prompt in a specific way for this model, as is done here. I don't think text-generation-webui does that (yet).

baleksey · 2023-03-17T00:23:18Z

@tloen @ItsLogic Guys, do you remember what was the minimal loss when you stopped the training?
And do you know (from papers or experience) what loss is considered great for such model/training goal?

Fine-tuning my 13B at the moment and guessing when it's "enough" :)

0xbitches · 2023-03-17T01:18:15Z

@baleksey Trained a 13B lora with one epoch as well, my loss was around 0.75 lowest. Text gen results don't feel too different for me than 7B though

devilismyfriend · 2023-03-17T01:32:03Z

Crazy to think we can even finetune the 13B on a 4090, next step is to train the 33B and convert it to 4int :D

ItsLogic · 2023-03-17T08:17:55Z

@ItsLogic Guys, do you remember what was the minimal loss when you stopped the training?

I hovered around low 0.8 and high 0.7 from about 20% through the first epoch. It ended just above 0.8.
I just finished epoch 3 and got 0.78. took 9 hours and 20 mins.

baleksey · 2023-03-17T10:41:28Z

Thanks! Just finished my 13B as well with 0.79.

Tried to run fine-tuning 33B with A6000. It uses 48.4GB out of 49 and says that 1 epoch will take about 12 hours. Maybe do it later.

mastoca · 2023-03-17T16:49:49Z

you may need to also add max_memory={0: 18} to LlamaForCausalLM.from_pretrained if u run into OOM errors when fine-tuning the 13B model

justinh-rahb · 2023-03-17T17:02:06Z

Any hero out there got Alpaca-13B distributed yet? For those of us that lack a 24GB GPU 😓

ItsLogic · 2023-03-17T17:11:45Z

Any hero out there got Alpaca-13B distributed yet? For those of us that lack a 24GB GPU sweat

I uploaded my epoch 1 and epoch 3 loras
https://huggingface.co/Draff/llama-alpaca-stuff/tree/main/Alpaca-Loras

justinh-rahb · 2023-03-17T17:18:56Z

@ItsLogic Forgive my ignorance, but how do I download this? from_pretrained() doesn't support subdirectories.

ItsLogic · 2023-03-17T17:27:25Z

@ItsLogic Forgive my ignorance, but how do I download this? from_pretrained() doesn't support subdirectories.

just download adapter_config.json and adapter_model.bin manually and put them in a folder and then edit the path in generate.py to point to that folder.

gururise · 2023-03-17T18:14:22Z

I uploaded my epoch 1 and epoch 3 loras

Any noticeable difference between epoch 1 and epoch 3? I'm going to train a model on my cleaned up dataset. There are a lot of issues with the current dataset.

ItsLogic · 2023-03-17T18:43:36Z

Any noticeable difference between epoch 1 and epoch 3?

Havent done any in depth testing yet but from my usage so far they feel about the same

0xbitches · 2023-03-17T19:39:17Z

Can report the same as @ItsLogic e1 and e3 feels roughly the same, probably because the loss are both ~0.78.

Somewhat related, when I was trying the model out with textgen-webui, the outputs are incredibly short. Don't know if it's a problem with the webui or the model itself.

gururise · 2023-03-17T20:52:13Z

Any hero out there got Alpaca-13B distributed yet? For those of us that lack a 24GB GPU sweat

Not so sure the 13B model is gonna perform much better than the 7B right now, the stanford dataset has a ton of issues. I've been going through trying to fix them.

I've done a first best effort to resolve the issues, and I'm training a new 7b model right now, but my GPU is a potato, so I wont have anything to show until tomorrow.

rohvani · 2023-03-17T21:30:26Z

Any hero out there got Alpaca-13B distributed yet? For those of us that lack a 24GB GPU sweat

Not so sure the 13B model is gonna perform much better than the 7B right now, the stanford dataset has a ton of issues. I've been going through trying to fix them.

I've done a first best effort to resolve the issues, and I'm training a new 7b model right now, but my GPU is a potato, so I wont have anything to show until tomorrow.

@gururise Could you please elaborate on the issues you are seeing with the Stanford Alpaca dataset?

Ahh, never mind -- I just saw this: #32

gvijqb · 2023-03-28T07:58:13Z

Hi @gururise

but my GPU is a potato

I'm co-founder of qblocks.cloud. We would love to offer to you some GPU credits to help with your research and experimentation on alpaca / lora. Can we connect some way?

gururise · 2023-03-28T15:56:48Z

Hi @gururise

but my GPU is a potato

I'm co-founder of qblocks.cloud. We would love to offer to you some GPU credits to help with your research and experimentation on alpaca / lora. Can we connect some way?

Would love to take you up on your offer of GPU credits to generate some fine-tuned Alpaca models using my cleaned dataset. I've sent you an email.

kizunasunhy · 2023-04-07T01:55:51Z

@tloen @ItsLogic Guys, do you remember what was the minimal loss when you stopped the training? And do you know (from papers or experience) what loss is considered great for such model/training goal?

Fine-tuning my 13B at the moment and guessing when it's "enough" :)

This is my loss curve figure for 13B model. All parameters same to 7B model. Training time 7.5h on 3090. Wondering if others are the same

mirekphd · 2023-05-01T06:59:18Z

Training time 7.5h on 3090.

I suppose Kaggle grandmaster's lesson of using 5-fold cross-validation also for DL models is out of the window then:)

pGit1 · 2023-05-24T03:37:04Z

@ItsLogic Can you show what your trainer args and hyper params are for the 13B training run? My models seem to take WAY longer than 10 hours to train. on 3090ti. Like a couple days at a time. :(

pGit1 · 2023-05-24T06:12:56Z

@ItsLogic nevermind. The longer training time definitely stemmed from cutoff len going from 256 to 512.

pmudgal-Intel · 2023-12-05T22:33:38Z

@ItsLogic I get the error while fine-tuning 13B on 2 quadro 24GB. Setting the micro_batch_size to 2 does not help either. What are your training params? the same code works fine for distributed finetuning of 7B model. So, I suspect I would have to reduce the size, not sure which one.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 23.65 GiB total capacity; 21.88 GiB already allocated; 8.31 MiB free; 22.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "finetune.py", line 317, in
fire.Fire(train)

pmudgal-Intel · 2023-12-06T20:20:20Z

Update. bitsandbytes==0.37.2 solved the problem. I had bitsandbytes==0.39.0 previously.

kartikayk mentioned this issue Mar 24, 2024

Generalize configs and add Llama2 13B + Mistral 7B pytorch/torchtune#571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyone try fine-tuning 13B model? #28

Anyone try fine-tuning 13B model? #28

gururise commented Mar 16, 2023

tloen commented Mar 16, 2023

ItsLogic commented Mar 16, 2023 •

edited

Loading

baleksey commented Mar 16, 2023

ItsLogic commented Mar 16, 2023

shawnanastasio commented Mar 16, 2023

baleksey commented Mar 17, 2023 •

edited

Loading

0xbitches commented Mar 17, 2023

devilismyfriend commented Mar 17, 2023

ItsLogic commented Mar 17, 2023

baleksey commented Mar 17, 2023 •

edited

Loading

mastoca commented Mar 17, 2023

justinh-rahb commented Mar 17, 2023 •

edited

Loading

ItsLogic commented Mar 17, 2023

justinh-rahb commented Mar 17, 2023 •

edited

Loading

ItsLogic commented Mar 17, 2023

gururise commented Mar 17, 2023

ItsLogic commented Mar 17, 2023

0xbitches commented Mar 17, 2023

gururise commented Mar 17, 2023 •

edited

Loading

rohvani commented Mar 17, 2023 •

edited

Loading

gvijqb commented Mar 28, 2023

gururise commented Mar 28, 2023 •

edited

Loading

kizunasunhy commented Apr 7, 2023 •

edited

Loading

mirekphd commented May 1, 2023

pGit1 commented May 24, 2023

pGit1 commented May 24, 2023

pmudgal-Intel commented Dec 5, 2023

pmudgal-Intel commented Dec 6, 2023

Anyone try fine-tuning 13B model? #28

Anyone try fine-tuning 13B model? #28

Comments

gururise commented Mar 16, 2023

tloen commented Mar 16, 2023

ItsLogic commented Mar 16, 2023 • edited Loading

baleksey commented Mar 16, 2023

ItsLogic commented Mar 16, 2023

shawnanastasio commented Mar 16, 2023

baleksey commented Mar 17, 2023 • edited Loading

0xbitches commented Mar 17, 2023

devilismyfriend commented Mar 17, 2023

ItsLogic commented Mar 17, 2023

baleksey commented Mar 17, 2023 • edited Loading

mastoca commented Mar 17, 2023

justinh-rahb commented Mar 17, 2023 • edited Loading

ItsLogic commented Mar 17, 2023

justinh-rahb commented Mar 17, 2023 • edited Loading

ItsLogic commented Mar 17, 2023

gururise commented Mar 17, 2023

ItsLogic commented Mar 17, 2023

0xbitches commented Mar 17, 2023

gururise commented Mar 17, 2023 • edited Loading

rohvani commented Mar 17, 2023 • edited Loading

gvijqb commented Mar 28, 2023

gururise commented Mar 28, 2023 • edited Loading

kizunasunhy commented Apr 7, 2023 • edited Loading

mirekphd commented May 1, 2023

pGit1 commented May 24, 2023

pGit1 commented May 24, 2023

pmudgal-Intel commented Dec 5, 2023

pmudgal-Intel commented Dec 6, 2023

ItsLogic commented Mar 16, 2023 •

edited

Loading

baleksey commented Mar 17, 2023 •

edited

Loading

baleksey commented Mar 17, 2023 •

edited

Loading

justinh-rahb commented Mar 17, 2023 •

edited

Loading

justinh-rahb commented Mar 17, 2023 •

edited

Loading

gururise commented Mar 17, 2023 •

edited

Loading

rohvani commented Mar 17, 2023 •

edited

Loading

gururise commented Mar 28, 2023 •

edited

Loading

kizunasunhy commented Apr 7, 2023 •

edited

Loading