Release v0.3.0 · pytorch/torchtune

Overview

We haven’t had a new release for a little while now, so there is a lot in this one. Some highlights include FSDP2 recipes for full finetune and LoRA(/QLoRA), support for DoRA fine-tuning, a PPO recipe for RLHF, Qwen2 models of various sizes, a ton of improvements to memory and performance (try our recipes with torch compile! try our sample packing with flex attention!), and Comet ML integration. For the full set of perf and memory improvements, we recommend installing with the PyTorch nightlies.

New Features

Here are highlights of some of our new features in 0.3.0.

Recipes

Full finetune FSDP2 recipe (#1287)
LoRA FSDP2 recipe with faster training than FSDP1 (#1517)
RLHF with PPO (#1005)
DoRA (#1115)
SimPO (#1223)

Models

Qwen2 0.5B, 1.5B, 7B model (#1143, #1247)
Flamingo model components (#1357)
CLIP encoder and vision transform (#1127)

Perf, memory, and quantization

Per-layer compile: 90% faster compile time and 75% faster training time (#1419)
Sample packing with flex attention: 80% faster training time with compile vs unpacked (#1193)
Chunked cross-entropy to reduce peak memory (#1390)
Make KV cache optional (#1207)
Option to save adapter checkpoint only (#1220)
Delete logits before bwd, saving ~4 GB (#1235)
Quantize linears without LoRA applied to NF4 (#1119)
Compile model and loss (#1296, #1319)
Speed up QLoRA initialization (#1294)
Set LoRA dropout to 0.0 to save memory (#1492)

Data/Datasets

Multimodal datasets: The Cauldron and LLaVA-Instruct-150K (#1158)
Multimodal collater (#1156)
Tokenizer redesign for better model-specific feature support (#1082)
Create general SFTDataset combining instruct and chat (#1234)
Interleaved image support in tokenizers (#1138)
Image transforms for CLIP encoder (#1084)
Vision cross-attention mask transform (#1141)
Support images in messages (#1504)

Miscellaneous

Deep fusion modules (#1338)
CometLogger integration (#1221)
Add profiler to full finetune recipes (#1288)
Support memory viz tool through the profiler (#1382, #1384)
Add RSO loss (#1197)
Add support for non-incremental decoding (#973)
Move utils directory to training (#1432, #1519, …)
Add bf16 dtype support on CPU (#1218)
Add grad norm logging (#1451)

Documentation

QAT tutorial (#1105)
Recipe docs pages and memory optimizations tutorial (#1230)
Add download commands to model API docs (#1167)
Updates to utils API docs (#1170)

Bug Fixes

Prevent pad ids, special tokens displaying in generate (#1211)
Reverting Gemma checkpoint logic causing missing head weight (#1168)
Fix compile on PyTorch 2.4 (#1512)
Fix Llama 3.1 RoPE init for compile (#1544)
Fix checkpoint load for FSDP2 with CPU offload (#1495)
Add missing quantization to Llama 3.1 layers (#1485)
Fix accuracy number parsing in Eleuther eval test (#1135)
Allow adding custom system prompt to messages (#1366)
Cast DictConfig -> dict in instantiate (#1450)

New Contributors (Auto generated by Github)

@sanchitintel made their first contribution in #1218
@lulmer made their first contribution in #1134
@stsouko made their first contribution in #1238
@spider-man-tm made their first contribution in #1220
@winglian made their first contribution in #1119
@fyabc made their first contribution in #1143
@mreso made their first contribution in #1274
@gau-nernst made their first contribution in #1288
@lucylq made their first contribution in #1269
@dzheng256 made their first contribution in #1221
@ChinoUkaegbu made their first contribution in #1310
@janeyx99 made their first contribution in #1382
@Gasoonjia made their first contribution in #1385
@shivance made their first contribution in #1417
@yf225 made their first contribution in #1419
@thomasjpfan made their first contribution in #1363
@AnuravModak made their first contribution in #1429
@lindawangg made their first contribution in #1451
@andrewldesousa made their first contribution in #1470
@mirceamironenco made their first contribution in #1523
@mikaylagawarecki made their first contribution in #1315

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0