-
Notifications
You must be signed in to change notification settings - Fork 404
/
README.md
334 lines (212 loc) · 16.4 KB
/
README.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
# torchtune
[![Unit Test](https:/pytorch/torchtune/actions/workflows/unit_test.yaml/badge.svg?branch=main)](https:/pytorch/torchtune/actions/workflows/unit_test.yaml)
![Recipe Integration Test](https:/pytorch/torchtune/actions/workflows/recipe_test.yaml/badge.svg)
[![](https://dcbadge.vercel.app/api/server/4Xsdn8Rr9Q?style=flat)](https://discord.gg/4Xsdn8Rr9Q)
[**Introduction**](#introduction) | [**Installation**](#installation) | [**Get Started**](#get-started) | [**Documentation**](https://pytorch.org/torchtune/main/index.html) | [**Design Principles**](#design-principles) | [**Community Contributions**](#community-contributions) | [**License**](#license)
> [!NOTE]
> **July 2024**: torchtune has updated model weights for Llama3.1 in source and nightly builds! Check out our configs for both the [8B and 70B versions](recipes/configs/llama3_1/) of the model. LoRA, QLoRA, and full finetune methods are supported. Support for QLoRA 405B will be added soon.
## Introduction
torchtune is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs. We're excited to announce our alpha release!
torchtune provides:
- Native-PyTorch implementations of popular LLMs using composable and modular building blocks
- Easy-to-use and hackable training recipes for popular fine-tuning techniques (LoRA, QLoRA) - no trainers, no frameworks, just PyTorch!
- YAML configs for easily configuring training, evaluation, quantization or inference recipes
- Built-in support for many popular dataset formats and prompt templates to help you quickly get started with training
torchtune focuses on integrating with popular tools and libraries from the ecosystem. These are just a few examples, with more under development:
- [Hugging Face Hub](https://huggingface.co/docs/hub/en/index) for [accessing model weights](torchtune/_cli/download.py)
- [EleutherAI's LM Eval Harness](https:/EleutherAI/lm-evaluation-harness) for [evaluating](recipes/eleuther_eval.py) trained models
- [Hugging Face Datasets](https://huggingface.co/docs/datasets/en/index) for [access](torchtune/datasets/_instruct.py) to training and evaluation datasets
- [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) for distributed training
- [torchao](https:/pytorch-labs/ao) for lower precision dtypes and [post-training quantization](recipes/quantize.py) techniques
- [Weights & Biases](https://wandb.ai/site) for [logging](https://pytorch.org/torchtune/main/deep_dives/wandb_logging.html) metrics and checkpoints, and tracking training progress
- [ExecuTorch](https://pytorch.org/executorch-overview) for [on-device inference](https:/pytorch/executorch/tree/main/examples/models/llama2#optional-finetuning) using fine-tuned models
- [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index) for low memory optimizers for our [single-device recipes](recipes/configs/llama2/7B_full_low_memory.yaml)
### Models
torchtune currently supports the following models.
| Model | Sizes |
|-----------------------------------------------|-----------|
| [Llama3.1](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1) | 8B, 70B [[models](torchtune/models/llama3_1/_model_builders.py), [configs](recipes/configs/llama3_1/)] |
| [Llama3](https://llama.meta.com/llama3) | 8B, 70B [[models](torchtune/models/llama3/_model_builders.py), [configs](recipes/configs/llama3/)] |
| [Llama2](https://llama.meta.com/llama2/) | 7B, 13B, 70B [[models](torchtune/models/llama2/_model_builders.py), [configs](recipes/configs/llama2/)] |
| [Code-Llama2](https://ai.meta.com/blog/code-llama-large-language-model-coding/) | 7B, 13B, 70B [[model](torchtune/models/code_llama2/_model_builders.py), [configs](recipes/configs/code_llama2/)] |
| [Mistral](https://huggingface.co/mistralai) | 7B [[model](torchtune/models/mistral/_model_builders.py), [configs](recipes/configs/mistral/)] |
| [Gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) | 2B, 7B [[model](torchtune/models/gemma/_model_builders.py), [configs](recipes/configs/gemma/)] |
| [Microsoft Phi3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Mini [[model](torchtune/models/phi3/), [configs](recipes/configs/phi3/)]
We're always adding new models, but feel free to [file an Issue](https:/pytorch/torchtune/issues/new) if there's a new one you would love to see in torchtune!
### Fine-tuning recipes
torchtune provides the following fine-tuning recipes.
| Training | Fine-tuning Method |
|------------------------------------|------------------------------------|
| Distributed Training [1 to 8 GPUs] | Full [[code](recipes/full_finetune_distributed.py), [example](recipes/configs/llama3/8B_full.yaml)], LoRA [[code](recipes/lora_finetune_distributed.py), [example](recipes/configs/llama3/8B_lora.yaml)] |
| Single Device / Low Memory [1 GPU] | Full [[code](recipes/full_finetune_single_device.py), [example](recipes/configs/llama3/8B_full_single_device.yaml)], LoRA + QLoRA [[code](recipes/lora_finetune_single_device.py), [example](recipes/configs/llama3/8B_lora_single_device.yaml)] |
| Single Device [1 GPU] | DPO [[code](recipes/full_finetune_distributed.py), [example](recipes/configs/llama2/7B_lora_dpo_single_device.yaml)]
Memory efficiency is important to us. All of our recipes are tested on a variety of setups including commodity GPUs with 24GB of VRAM as well as beefier options found in data centers.
Single-GPU recipes expose a number of memory optimizations that aren't available in the distributed versions. These include support for low-precision optimizers from [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index) and fusing optimizer step with backward to reduce memory footprint from the gradients (see example [config](https:/pytorch/torchtune/blob/main/recipes/configs/llama2/7B_full_low_memory.yaml)). For memory-constrained setups, we recommend using the single-device configs as a starting point.
This table captures the peak memory usage and training speed for recipes in torchtune.
| Example HW Resources | Finetuning Method | Model | Setting | Peak Memory per GPU (GB) | Training Speed (tokens/sec) |
|:-:|:-:|:-:|:-:|:-:|:-:|
| 1 x RTX 4090 | QLoRA ** | Llama2-7B | Batch Size = 4, Seq Length = 2048 | 12.3 GB | 3155 |
| 1 x RTX 4090 | LoRA | Llama2-7B | Batch Size = 4, Seq Length = 2048 | 21.3 GB | 2582 |
| 2 x RTX 4090 | LoRA | Llama2-7B | Batch Size = 4, Seq Length = 2048 | 16.2 GB | 2768 |
| 1 x RTX 4090 | Full finetune * | Llama2-7B | Batch Size = 4, Seq Length = 2048 | 24.1 GB | 702 |
| 4 x RTX 4090 | Full finetune | Llama2-7B | Batch Size = 4, Seq Length = 2048 | 24.1 GB | 1388 |
| 8 x A100 | LoRA | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 26.4 GB | 3384 |
| 8 x A100 | Full Finetune * | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 70.4 GB | 2032 |
*= Uses PagedAdamW from bitsandbytes
**= Uses torch compile
## Llama3 and Llama3.1
torchtune supports fine-tuning for the Llama3 8B and 70B size models. We currently support LoRA, QLoRA and full fine-tune on a single GPU as well as LoRA and full fine-tune on multiple devices for the 8B model, and LoRA on multiple devices for the 70B model. For all the details, take a look at our [tutorial](https://pytorch.org/torchtune/main/tutorials/llama3.html).
> [!NOTE]
> Our Llama3 and Llama3.1 LoRA and QLoRA configs default to the instruct fine-tuned models. This is because not all special token embeddings are initialized in the base 8B and 70B models.
In our initial experiments for Llama3-8B, QLoRA has a peak allocated memory of ``~9GB`` while LoRA on a single GPU has a peak allocated memory of ``~19GB``. To get started, you can use our default configs to kick off training.
### Single GPU
LoRA 8B
```bash
tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device
```
QLoRA 8B
```bash
tune run lora_finetune_single_device --config llama3_1/8B_qlora_single_device
```
Full 8B
```bash
tune run full_finetune_single_device --config llama3_1/8B_full_single_device
```
### Multi GPU
Full 8B
```bash
tune run --nproc_per_node 4 full_finetune_distributed --config llama3_1/8B_full
```
LoRA 8B
```bash
tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_1/8B_lora
```
LoRA 70B
Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run
```bash
tune download meta-llama/Meta-Llama-3.1-70b --hf-token <> --output-dir /tmp/Meta-Llama-3.1-70b --ignore-patterns "original/consolidated*"
```
Then, a finetune can be kicked off:
```bash
tune run --nproc_per_node 8 lora_finetune_distributed --config llama3_1/70B_lora.yaml
```
You can find a full list of all our Llama3 configs [here](recipes/configs/llama3) and Llama3.1 configs [here.](recipes/configs/llama3_1)
---
## Installation
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For fine-tuning the multimodal LLMs available in the repo, you'll need to install torchvision as well.
```
# Install stable version of PyTorch using pip
pip install torch torchvision
# Nightly install for latest features
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121
```
**Step 2:** The latest stable version of torchtune is hosted on PyPI and can be downloaded with the following command:
```bash
pip install torchtune
```
To confirm that the package is installed correctly, you can run the following command:
```bash
tune --help
```
And should see the following output:
```bash
usage: tune [-h] {ls,cp,download,run,validate} ...
Welcome to the torchtune CLI!
options:
-h, --help show this help message and exit
...
```
You can also install the latest and greatest torchtune has to offer by [installing a nightly build](https://pytorch.org/torchtune/main/install.html).
---
## Get Started
To get started with fine-tuning your first LLM with torchtune, see our tutorial on [fine-tuning Llama2 7B](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [end-to-end workflow](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama2.
### Downloading a model
Follow the instructions on the official [`meta-llama`](https://huggingface.co/meta-llama) repository to ensure you have access to the official Llama model weights. Once you have confirmed access, you can run the following command to download the weights to your local machine. This will also download the tokenizer model and a responsible use guide.
### Llama3 download
```bash
tune download meta-llama/Meta-Llama-3-8B \
--output-dir /tmp/Meta-Llama-3-8B \
--hf-token <HF_TOKEN> \
```
> [!Tip]
> Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens
### Running fine-tuning recipes
Llama3 8B + LoRA on single GPU:
```bash
tune run lora_finetune_single_device --config llama2/7B_lora_single_device
```
For distributed training, tune CLI integrates with [torchrun](https://pytorch.org/docs/stable/elastic/run.html).
Llama3 8B + LoRA on two GPUs:
```bash
tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full
```
> [!Tip]
> Make sure to place any torchrun commands **before** the recipe specification. Any CLI args after this will override the config and not impact distributed training.
### Modify Configs
There are two ways in which you can modify configs:
**Config Overrides**
You can easily overwrite config properties from the command-line:
```bash
tune run lora_finetune_single_device \
--config llama2/7B_lora_single_device \
batch_size=8 \
enable_activation_checkpointing=True \
max_steps_per_epoch=128
```
**Update a Local Copy**
You can also copy the config to your local directory and modify the contents directly:
```bash
tune cp llama2/7B_full ./my_custom_config.yaml
Copied to ./7B_full.yaml
```
Then, you can run your custom recipe by directing the `tune run` command to your local files:
```bash
tune run full_finetune_distributed --config ./my_custom_config.yaml
```
Check out `tune --help` for all possible CLI commands and options. For more information on using and updating configs, take a look at our [config deep-dive](https://pytorch.org/torchtune/main/deep_dives/configs.html).
## Design Principles
torchtune embodies PyTorch’s design philosophy [[details](https://pytorch.org/docs/stable/community/design.html)], especially "usability over everything else".
### Native PyTorch
torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: Hugging Face Datasets, EleutherAI Eval Harness), all of the core functionality is written in PyTorch.
### Simplicity and Extensibility
torchtune is designed to be easy to understand, use and extend.
- Composition over implementation inheritance - layers of inheritance for code re-use makes the code hard to read and extend
- No training frameworks - explicitly outlining the training logic makes it easy to extend for custom use cases
- Code duplication is preferred over unnecessary abstractions
- Modular building blocks over monolithic components
### Correctness
torchtune provides well-tested components with a high-bar on correctness. The library will never be the first to provide a feature, but available features will be thoroughly tested. We provide
- Extensive unit-tests to ensure component-level numerical parity with reference implementations
- Checkpoint-tests to ensure model-level numerical parity with reference implementations
- Integration tests to ensure recipe-level performance parity with reference implementations on standard benchmarks
## Community Contributions
We really value our community and the contributions made by our wonderful users. We'll use this section to call out some of these contributions! If you'd like to help out as well, please see the [CONTRIBUTING](CONTRIBUTING.md) guide.
- [@solitude-alive](https:/solitude-alive) for adding the [Gemma 2B model](torchtune/models/gemma/) to torchtune, including recipe changes, numeric validations of the models and recipe correctness
- [@yechenzhi](https:/yechenzhi) for adding [DPO](recipes/lora_dpo_single_device.py) to torchtune, including the recipe and config along with correctness checks
## Acknowledgements
The Llama2 code in this repository is inspired by the original [Llama2 code](https:/meta-llama/llama/blob/main/llama/model.py).
We want to give a huge shout-out to EleutherAI, Hugging Face and Weights & Biases for being wonderful collaborators and for working with us on some of these integrations within torchtune.
We also want to acknowledge some awesome libraries and tools from the ecosystem:
- [gpt-fast](https:/pytorch-labs/gpt-fast) for performant LLM inference techniques which we've adopted OOTB
- [llama recipes](https:/meta-llama/llama-recipes) for spring-boarding the llama2 community
- [bitsandbytes](https:/TimDettmers/bitsandbytes) for bringing several memory and performance based techniques to the PyTorch ecosystem
- [@winglian](https:/winglian/) and [axolotl](https:/OpenAccess-AI-Collective/axolotl) for early feedback and brainstorming on torchtune's design and feature set.
- [lit-gpt](https:/Lightning-AI/litgpt) for pushing the LLM fine-tuning community forward.
- [HF TRL](https:/huggingface/trl) for making reward modeling more accessible to the PyTorch community.
## License
torchtune is released under the [BSD 3 license](./LICENSE). However you may have other legal obligations that govern your use of other content, such as the terms of service for third-party models.