2/n - Make Gemma use regular TransformerDecoder #1553

felipemello1 · 2024-09-12T03:03:32Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

This PR was built on top of #1547 (comment), so qwen changes should disappear from here after the other PR is merged

Changelog

updates gemma to use TransformerDecoder
Uses TiedLinear for the output projection
Uses Normalized Embedding to put together embedding + gemma norm
Also quick fix on lora alpha to be 2x lora rank
Reduce warmup steps to 10. I dont see why we need 100. LoRA training should be stable, we dont have it in finetuning. I did this after comparing a short run of full vs lora, and realizing that lora loss didnt decrease at all for the first 50 steps.

Test plan

resume from checkpoint working well

tune run --nnodes 1 --nproc_per_node 8 full_finetune_distributed --config gemma/2B_full batch_size=8 max_steps_per_epoch=20 metric_logger=torchtune.training.metric_logging.WandBLogger gradient_accumulation_steps=1 epochs=2 compile=True

pytorch-bot · 2024-09-12T03:03:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1553

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 175782f with merge base 7c51100 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…former

…ate_tied_transformer

joecummings

🫡

joecummings · 2024-09-12T18:30:07Z

torchtune/training/_compile.py

 from torchtune.modules.loss import CEWithChunkedOutputLoss
 from torchtune.utils import get_logger, torch_version_ge

 log = get_logger("INFO")


 def compile_model(
- model: Union[TransformerDecoder, TiedEmbeddingTransformerDecoder],
+ model: TransformerDecoder,


Why wasn't this handled in 1/n?

I forgot and left a comment that i was doing it in 2/n

RdoubleA · 2024-09-12T18:45:18Z

torchtune/models/gemma/gemma_norm_embedding.py

+
+
+class GemmaNormEmbeddings(nn.Embedding):
+ def __init__(self, in_dim: int, out_dim: int):


docstrings, esp to explain why this is a separate class

codecov-commenter · 2024-09-12T18:59:00Z

Codecov Report

Attention: Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.

Project coverage is 73.18%. Comparing base (7c51100) to head (4e1e8a6).

Files with missing lines	Patch %	Lines
torchtune/models/gemma/_component_builders.py	66.66%	3 Missing ⚠️
torchtune/models/gemma/gemma_norm_embedding.py	77.77%	2 Missing ⚠️
torchtune/models/gemma/transformer.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1553      +/-   ##
==========================================
- Coverage   73.32%   73.18%   -0.14%     
==========================================
  Files         288      289       +1     
  Lines       14133    14164      +31     
==========================================
+ Hits        10363    10366       +3     
- Misses       3770     3798      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Felipe Mello added 14 commits September 4, 2024 12:28

use identity for dropout if 0

4690b9b

update model builders

17e6d79

add lora dropout to configs

b0154b9

typo

e27f736

Merge branch 'main' into set_dropout_zero

051f472

add missing lora dropout

002d67f

Merge branch 'main' into remove_tied_embeddings

d430c1f

update qwen

6adf19f

change typehint

c6dd298

import deprecated

a55f9ae

update import

b427bf5

add tied linear

f54904e

remove unused import

a0bd26b

update gemma

2500a4c

felipemello1 requested review from ebsmothers and RdoubleA September 12, 2024 03:03

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 12, 2024

felipemello1 changed the title ~~Make Gemma use regular TransformerDecoder~~ 2/n - Make Gemma use regular TransformerDecoder Sep 12, 2024

felipemello1 mentioned this pull request Sep 12, 2024

Remove TiedEmbeddingTransformerDecoder (and GemmaTransformerDecoder) #1454

Closed

Felipe Mello added 3 commits September 11, 2024 20:10

Merge branch 'remove_tied_embeddings' into gemma_deprecate_tied_trans…

ceeba8f

…former

typehint

1bdcf01

Merge branch 'main' of github.com:pytorch/torchtune into gemma_deprec…

4e1e8a6

…ate_tied_transformer

joecummings reviewed Sep 12, 2024

View reviewed changes

RdoubleA approved these changes Sep 12, 2024

View reviewed changes

update docstring

175782f

felipemello1 merged commit 7dad2d6 into pytorch:main Sep 12, 2024
17 checks passed

felipemello1 deleted the gemma_deprecate_tied_transformer branch September 12, 2024 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2/n - Make Gemma use regular TransformerDecoder #1553

2/n - Make Gemma use regular TransformerDecoder #1553

felipemello1 commented Sep 12, 2024 •

edited

Loading

pytorch-bot bot commented Sep 12, 2024 •

edited

Loading

joecummings left a comment •

edited

Loading

joecummings Sep 12, 2024

felipemello1 Sep 12, 2024

RdoubleA Sep 12, 2024

codecov-commenter commented Sep 12, 2024



		class GemmaNormEmbeddings(nn.Embedding):
		def __init__(self, in_dim: int, out_dim: int):

2/n - Make Gemma use regular TransformerDecoder #1553

2/n - Make Gemma use regular TransformerDecoder #1553

Conversation

felipemello1 commented Sep 12, 2024 • edited Loading

Context

Changelog

Test plan

pytorch-bot bot commented Sep 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1553

✅ No Failures

joecummings left a comment • edited Loading

Choose a reason for hiding this comment

joecummings Sep 12, 2024

Choose a reason for hiding this comment

felipemello1 Sep 12, 2024

Choose a reason for hiding this comment

RdoubleA Sep 12, 2024

Choose a reason for hiding this comment

codecov-commenter commented Sep 12, 2024

Codecov Report

felipemello1 commented Sep 12, 2024 •

edited

Loading

pytorch-bot bot commented Sep 12, 2024 •

edited

Loading

joecummings left a comment •

edited

Loading