Mistral configs #591

kartikayk · 2024-03-26T14:35:01Z

Context

#571 added support for Mistral 7B. In this PR, I add configs for Mistral7B full finetuning and lora. These can be further improved, but have decent results OOTB.

Full finetune:

 tune --nnodes 1 --nproc_per_node 4 full_finetune_distributed \
--config mistral/7B_full \
metric_logger=torchtune.utils.metric_logging.WandBLogger \
metric_logger.project=test

Loss Curve:

Eval on truthfulqa_mc2

LoRA:

tune --nnodes 1 --nproc_per_node 4 lora_finetune_distributed \
--config mistral/7B_lora  \
metric_logger=torchtune.utils.metric_logging.WandBLogger \
metric_logger.project=test

Loss Curve:

Eval on truthfulqa_mc2

FAQ

How did you come up with these configs?

Getting training to stabilize took some work. This model seems to need a smaller LR than the Llama2 7B/13B models and for LoRA, I needed to ramp up rank and alpha a bit. I don't claim this to be novel work though. I just did a bunch of snooping around on localllama [example] and some other forums and blogs to come up with a config with reasonable results.

Changelog

Adding component and model builders for LoRA with Mistral 7B
Add configs for LoRA and Full finetuning for Mistral 7B

Test plan

Tests

pytest tests

E2E Runs - see above

pytorch-bot · 2024-03-26T14:35:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/591

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] - We recently implemented changes in pull job linux-jammy-py3.8-gcc11 / build

✅ No Failures

As of commit ef54a33 with merge base d2e36ed ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings · 2024-03-26T14:49:43Z

torchtune/models/mistral/_component_builders.py

+ lora_attn_modules: List[LORA_ATTN_MODULES],
+ apply_lora_to_mlp: bool = False,
+ apply_lora_to_output: bool = False,
+ *,


wooooo kwargs

Can't take credit for this. This comes from @ebsmothers

joecummings · 2024-03-26T14:50:49Z

recipes/configs/mistral/7B_lora.yaml

@@ -0,0 +1,69 @@
+# This config is currently a WIP. Use it with caution


Are these configs based on the paper and/or other results showing good performance?

Mostly digging on various forums like reddit

Can we update the comment to say something like "this config is based on a small set of experiments and is not intended to reproduce results from the Mistral paper or elsewhere"

joecummings · 2024-03-26T15:33:43Z

recipes/configs/mistral/7B_full.yaml

- output_dir: /tmp/Mistral-7B-v0.1
- model_type: LLAMA2
+ output_dir: /tmp/Mistral-7B-v0.1/
+ model_type: MISTRAL


This technically varies from the HF way of defining model type b/c mistral is llama-based. What does model_type=Mistral give us?

Less confusion? Defining mistral model to be llama type is a bit confusing I think

But it's technically true that it is a llama-type model and this is the accepted standard in HF.

I'd definitely dont want to carry over any confusions from there

Personally I like the separate model type. Yes it's an identical architecture, but just helps to be explicit. Main q is: is this used as a hard constraint in places? E.g. is it gonna prevent me from loading Mistral weights into a LLAMA2 model type

ebsmothers · 2024-03-26T17:29:36Z

README.md

@@ -40,7 +40,7 @@ The library currently supports the following models and fine-tuning methods.
 |-----------------------------------------------|-----------|-----------------------------------------------------------|
 | [Llama2](torchtune/models/llama2/_model_builders.py) | 7B | Full Finetuning [[single device](recipes/configs/llama2/7B_full_single_device.yaml), [distributed](recipes/configs/llama2/7B_full.yaml)] LoRA [[single device](recipes/configs/llama2/7B_lora_single_device.yaml), [distributed](recipes/configs/llama2/7B_lora.yaml)] QLoRA [single device](recipes/configs/llama2/7B_qlora_single_device.yaml) |
 | [Llama2](torchtune/models/llama2/_model_builders.py) | 13B | [Full Finetuning](recipes/configs/llama2/13B_full.yaml), [LoRA](recipes/configs/llama2/13B_lora.yaml)
-| [Mistral](torchtune/models/mistral//_model_builders.py) | 7B | Full Finetuning and LoRA are WIP and will be added soon
+| [Mistral](torchtune/models/mistral//_model_builders.py) | 7B | [Full Finetuning](recipes/configs/mistral/7B_full.yaml), [LoRA](recipes/configs/mistral/7B_lora.yaml)


Not necessary for this PR but I think this table needs a cleanup. We should either split finetuning methods into separate columns or rows grouped under model family

Agreed, this table does need an update. Let me follow up separately on this.

rohan-varma · 2024-03-26T18:06:47Z

torchtune/models/mistral/_component_builders.py

+
+ tok_embeddings = nn.Embedding(vocab_size, embed_dim)
+
+ # TODO: quantize_base is not applied to final output_proj currently.


We don't have quantize_base at all in mistral right now, right?

Yeh need to run some experiments on this before I add this. Training on mistral is quite different from llama2 and so will do QLoRA as a follow up

Would at least remove the todo for now then

rohan-varma · 2024-03-26T18:07:09Z

torchtune/models/mistral/_model_builders.py

+ lora_alpha: float = 16,
+) -> TransformerDecoder:
+ """
+ Builder for creating a Llama2 7B model with LoRA enabled.


ebsmothers · 2024-03-26T20:27:02Z

torchtune/models/mistral/_component_builders.py

+ num_kv_heads: int,
+ max_seq_len: int,
+ attn_dropout: float = 0.0,
+ rope_base: int = 10_000,


Why do we parametrize this here but not in llama2 builders?

ebsmothers · 2024-03-26T20:30:04Z

torchtune/models/mistral/_model_builders.py

+ Returns:
+ TransformerDecoder: Instantiation of Llama2 7B model with LoRA applied
+ """
+ return lora_mistral(


Would be nice to document somewhere exactly what the differences are between this builder and the equivalent llama2 one. Doesn't have to be for this PR though; as an added bonus it nicely shows off how easily we can switch between the two.

ebsmothers · 2024-03-26T20:32:36Z

recipes/configs/mistral/7B_lora.yaml

+# Model Arguments
+model:
+ _component_: torchtune.models.mistral.lora_mistral_7b
+ lora_attn_modules: ['q_proj', 'v_proj', 'k_proj']


nit: order q, k, v for clarity

ebsmothers · 2024-03-26T20:34:29Z

recipes/configs/mistral/7B_lora.yaml

+# Distributed
+cpu_offload: False


I don't think this is even used? (I see it's still lurking around in other configs too..)

ebsmothers

Do we wanna add a unit test for the new model? Doesn't have to be for all levels of components but at least e.g. at the level of mistral or lora_mistral

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024

kartikayk requested review from ebsmothers and joecummings March 26, 2024 14:35

joecummings reviewed Mar 26, 2024

View reviewed changes

kartikayk added 4 commits March 26, 2024 09:58

Add mistral configs for full and lora

a160c26

add mistral model type

e1fb9f7

lint

9e1b4d9

update readme

6fd95cf

kartikayk force-pushed the mistral_configs branch from a7a24bc to 6fd95cf Compare March 26, 2024 16:58

ebsmothers reviewed Mar 26, 2024

View reviewed changes

rohan-varma reviewed Mar 26, 2024

View reviewed changes

ebsmothers reviewed Mar 26, 2024

View reviewed changes

kartikayk added 2 commits March 26, 2024 18:05

Address comments

1c554e7

lint

ef54a33

ebsmothers approved these changes Mar 27, 2024

View reviewed changes

kartikayk merged commit bbac98c into main Mar 27, 2024
20 checks passed

kartikayk deleted the mistral_configs branch March 27, 2024 02:06

kartikayk mentioned this pull request Mar 27, 2024

Add Mistral 7B finetuning config #577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral configs #591

Mistral configs #591

kartikayk commented Mar 26, 2024 •

edited

Loading

pytorch-bot bot commented Mar 26, 2024 •

edited

Loading

joecummings Mar 26, 2024

kartikayk Mar 26, 2024

joecummings Mar 26, 2024

kartikayk Mar 26, 2024

ebsmothers Mar 26, 2024

joecummings Mar 26, 2024

kartikayk Mar 26, 2024

joecummings Mar 26, 2024

kartikayk Mar 26, 2024

ebsmothers Mar 26, 2024

ebsmothers Mar 26, 2024

kartikayk Mar 26, 2024

rohan-varma Mar 26, 2024

kartikayk Mar 26, 2024

ebsmothers Mar 26, 2024

rohan-varma Mar 26, 2024

ebsmothers Mar 26, 2024

ebsmothers Mar 26, 2024

ebsmothers Mar 26, 2024

ebsmothers Mar 26, 2024

ebsmothers left a comment

		@@ -0,0 +1,69 @@
		# This config is currently a WIP. Use it with caution


		tok_embeddings = nn.Embedding(vocab_size, embed_dim)

		# TODO: quantize_base is not applied to final output_proj currently.

Mistral configs #591

Mistral configs #591

Conversation

kartikayk commented Mar 26, 2024 • edited Loading

Context

FAQ

Changelog

Test plan

pytorch-bot bot commented Mar 26, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/591

❗ 1 Active SEVs

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers left a comment

Choose a reason for hiding this comment

kartikayk commented Mar 26, 2024 •

edited

Loading

pytorch-bot bot commented Mar 26, 2024 •

edited

Loading