Skip to content

v0.3.0

Compare
Choose a tag to compare
@ebsmothers ebsmothers released this 18 Sep 01:57
· 150 commits to main since this release

Overview

We haven’t had a new release for a little while now, so there is a lot in this one. Some highlights include FSDP2 recipes for full finetune and LoRA(/QLoRA), support for DoRA fine-tuning, a PPO recipe for RLHF, Qwen2 models of various sizes, a ton of improvements to memory and performance (try our recipes with torch compile! try our sample packing with flex attention!), and Comet ML integration. For the full set of perf and memory improvements, we recommend installing with the PyTorch nightlies.

New Features

Here are highlights of some of our new features in 0.3.0.

Recipes

  • Full finetune FSDP2 recipe (#1287)
  • LoRA FSDP2 recipe with faster training than FSDP1 (#1517)
  • RLHF with PPO (#1005)
  • DoRA (#1115)
  • SimPO (#1223)

Models

  • Qwen2 0.5B, 1.5B, 7B model (#1143, #1247)
  • Flamingo model components (#1357)
  • CLIP encoder and vision transform (#1127)

Perf, memory, and quantization

  • Per-layer compile: 90% faster compile time and 75% faster training time (#1419)
  • Sample packing with flex attention: 80% faster training time with compile vs unpacked (#1193)
  • Chunked cross-entropy to reduce peak memory (#1390)
  • Make KV cache optional (#1207)
  • Option to save adapter checkpoint only (#1220)
  • Delete logits before bwd, saving ~4 GB (#1235)
  • Quantize linears without LoRA applied to NF4 (#1119)
  • Compile model and loss (#1296, #1319)
  • Speed up QLoRA initialization (#1294)
  • Set LoRA dropout to 0.0 to save memory (#1492)

Data/Datasets

  • Multimodal datasets: The Cauldron and LLaVA-Instruct-150K (#1158)
  • Multimodal collater (#1156)
  • Tokenizer redesign for better model-specific feature support (#1082)
  • Create general SFTDataset combining instruct and chat (#1234)
  • Interleaved image support in tokenizers (#1138)
  • Image transforms for CLIP encoder (#1084)
  • Vision cross-attention mask transform (#1141)
  • Support images in messages (#1504)

Miscellaneous

  • Deep fusion modules (#1338)
  • CometLogger integration (#1221)
  • Add profiler to full finetune recipes (#1288)
  • Support memory viz tool through the profiler (#1382, #1384)
  • Add RSO loss (#1197)
  • Add support for non-incremental decoding (#973)
  • Move utils directory to training (#1432, #1519, …)
  • Add bf16 dtype support on CPU (#1218)
  • Add grad norm logging (#1451)

Documentation

  • QAT tutorial (#1105)
  • Recipe docs pages and memory optimizations tutorial (#1230)
  • Add download commands to model API docs (#1167)
  • Updates to utils API docs (#1170)

Bug Fixes

  • Prevent pad ids, special tokens displaying in generate (#1211)
  • Reverting Gemma checkpoint logic causing missing head weight (#1168)
  • Fix compile on PyTorch 2.4 (#1512)
  • Fix Llama 3.1 RoPE init for compile (#1544)
  • Fix checkpoint load for FSDP2 with CPU offload (#1495)
  • Add missing quantization to Llama 3.1 layers (#1485)
  • Fix accuracy number parsing in Eleuther eval test (#1135)
  • Allow adding custom system prompt to messages (#1366)
  • Cast DictConfig -> dict in instantiate (#1450)

New Contributors (Auto generated by Github)

@sanchitintel made their first contribution in #1218
@lulmer made their first contribution in #1134
@stsouko made their first contribution in #1238
@spider-man-tm made their first contribution in #1220
@winglian made their first contribution in #1119
@fyabc made their first contribution in #1143
@mreso made their first contribution in #1274
@gau-nernst made their first contribution in #1288
@lucylq made their first contribution in #1269
@dzheng256 made their first contribution in #1221
@ChinoUkaegbu made their first contribution in #1310
@janeyx99 made their first contribution in #1382
@Gasoonjia made their first contribution in #1385
@shivance made their first contribution in #1417
@yf225 made their first contribution in #1419
@thomasjpfan made their first contribution in #1363
@AnuravModak made their first contribution in #1429
@lindawangg made their first contribution in #1451
@andrewldesousa made their first contribution in #1470
@mirceamironenco made their first contribution in #1523
@mikaylagawarecki made their first contribution in #1315