fix max seq len #807

ebsmothers · 2024-04-19T03:06:41Z

Test plan:

tune run lora_finetune_single_device --config llama3/8B_lora_single_device
...
1|21|Loss: 2.320160388946533:   0%|▏

pytorch-bot · 2024-04-19T03:06:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/807

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8a71824 with merge base b74fd3a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kartikayk

Thanks for the fix. I finally found the reference:

We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries.

kartikayk · 2024-04-19T03:29:25Z

One note, we do double are RoPE cache. But I dont think this will meaningfully impact training.

rohan-varma · 2024-04-19T08:26:06Z

@kartikayk Is there a way to prevent this that's used in pretraining or other finetuning libs that we don't have? Seems our implementation makes RoPE directly proportional to max_seq_len, right? https:/pytorch/torchtune/blob/main/torchtune/modules/position_embeddings.py

fix max seq len

8a71824

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 19, 2024

ebsmothers requested a review from rohan-varma April 19, 2024 03:07

kartikayk approved these changes Apr 19, 2024

View reviewed changes

ebsmothers merged commit 41341fd into pytorch:main Apr 19, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix max seq len #807

fix max seq len #807

ebsmothers commented Apr 19, 2024 •

edited

Loading

pytorch-bot bot commented Apr 19, 2024 •

edited

Loading

kartikayk left a comment

kartikayk commented Apr 19, 2024

rohan-varma commented Apr 19, 2024

fix max seq len #807

fix max seq len #807

Conversation

ebsmothers commented Apr 19, 2024 • edited Loading

pytorch-bot bot commented Apr 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/807

✅ No Failures

kartikayk left a comment

Choose a reason for hiding this comment

kartikayk commented Apr 19, 2024

rohan-varma commented Apr 19, 2024

ebsmothers commented Apr 19, 2024 •

edited

Loading

pytorch-bot bot commented Apr 19, 2024 •

edited

Loading