Remove `TiedEmbeddingTransformerDecoder` (and `GemmaTransformerDecoder`) #1454

SalmanMohammadi · 2024-08-29T20:20:11Z

Credit to @pbontrager for this.

To quote from #1447

Also just going to throw this out there: @pbontrager had suggested that we can do away with TiedEmbeddingTransformerDecoder entirely and instead do something like (crappy diff snippet from one of our model builders as an example)

    tok_embeddings = nn.Embedding(vocab_size, embed_dim)
    
	# How we do things for non-tied-embedding models (e.g. Llama)
    # output_proj = nn.Linear(embed_dim, vocab_size, bias=False)

	# What things would look like for Gemma or Qwen2 instead
    output_proj = lambda x: F.linear(x, tok_embeddings.weight)

    return TransformerDecoder(
        tok_embeddings=tok_embeddings,
        layers=layer,
        num_layers=num_layers,
        max_seq_len=max_seq_len,
        num_heads=num_heads,
        head_dim=head_dim,
        norm=RMSNorm(embed_dim, eps=norm_eps),
        output=output_proj,
    )

to build tied-embedding models directly in our TransformerDecoder class (though possibly without a lambda and maybe with a proper function). Main open question is whether this works with FSDP and checkpointing

To provide some background, currently, for models which have tied embedding-output projection weights such as Gemma and Qwen2, we define an entirely separate TransformerDecoder. This class is identical except for the final lines in the forward signature:

# in TransformerDecoder

        # shape: [b, s, out_dim] - out_dim is usually the vocab size
        output = self.output(h).float()
        return output

# in TiedEmbeddingTransformerDecoder and GemmaTransformerDecoder

        # shape: [b, s, out_dim] - out_dim is usually the vocab size
        output = F.linear(h, self.tok_embeddings.weight).float()
        ...
        return output

Making this change should be conceptually very straightforward, but will touch several parts of the codebase. Off the top of my head, and in no particular order:

Try parameterize the tied embedding using a callable (either a lambda as above, or something like

output_proj = partial(torch.nn.functional.linear, tok_embeddings.weight)

or as a proper function. Just for a single model (e.g. Qwen2 0.5B)
2) Test out a recipe with this builder and make sure things work OK - they should! Also make sure one of our distributed recipes works okay (we can help out here).
3) Extend the change to any other models using either GemmaTransformerDecoder, or TiedEmbeddingTransformerDecoder. All of these component and model builders should now just construct and return a TransformerDecoder.
4) CTRL+F GemmaTransformerDecoder and TiedEmbeddingTransformerDecoder. Eradicate. Docs, tests, __init__.pys. Be ruthless.
5) Probably something else here.
6) $$$

Tasks

Give feedback

1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

CLA Signed
2/n - Make Gemma use regular TransformerDecoder #1553

CLA Signed
Options

The text was updated successfully, but these errors were encountered:

felipemello1 · 2024-09-12T03:05:24Z

#1547
#1553

felipemello1 · 2024-09-13T02:23:05Z

PRs landed. Thanks for the issue!! :)

SalmanMohammadi added good first issue Good for newcomers community help wanted We would love the community's help completing this issue better engineering Tasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs labels Aug 29, 2024

SalmanMohammadi changed the title ~~Remove TiedEmbeddingTransformerDecoder~~ Remove TiedEmbeddingTransformerDecoder (and GemmaTransformerDecoder) Aug 29, 2024

SalmanMohammadi removed the good first issue Good for newcomers label Aug 29, 2024

SalmanMohammadi mentioned this issue Aug 30, 2024

Gemma generation hotfix #1462

Merged

11 tasks

felipemello1 closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `TiedEmbeddingTransformerDecoder` (and `GemmaTransformerDecoder`) #1454

Remove `TiedEmbeddingTransformerDecoder` (and `GemmaTransformerDecoder`) #1454

SalmanMohammadi commented Aug 29, 2024 •

edited by felipemello1

Loading

Tasks

felipemello1 commented Sep 12, 2024

felipemello1 commented Sep 13, 2024

Remove TiedEmbeddingTransformerDecoder (and GemmaTransformerDecoder) #1454

Remove TiedEmbeddingTransformerDecoder (and GemmaTransformerDecoder) #1454

Comments

SalmanMohammadi commented Aug 29, 2024 • edited by felipemello1 Loading

Tasks

felipemello1 commented Sep 12, 2024

felipemello1 commented Sep 13, 2024

Remove `TiedEmbeddingTransformerDecoder` (and `GemmaTransformerDecoder`) #1454

Remove `TiedEmbeddingTransformerDecoder` (and `GemmaTransformerDecoder`) #1454

SalmanMohammadi commented Aug 29, 2024 •

edited by felipemello1

Loading