Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix max seq len #807

Merged
merged 1 commit into from
Apr 19, 2024
Merged

fix max seq len #807

merged 1 commit into from
Apr 19, 2024

Conversation

ebsmothers
Copy link
Contributor

@ebsmothers ebsmothers commented Apr 19, 2024

Test plan:

tune run lora_finetune_single_device --config llama3/8B_lora_single_device
...
1|21|Loss: 2.320160388946533:   0%|▏ 

Copy link

pytorch-bot bot commented Apr 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/807

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8a71824 with merge base b74fd3a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 19, 2024
Copy link
Contributor

@kartikayk kartikayk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I finally found the reference:

We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries.

@ebsmothers ebsmothers merged commit 41341fd into pytorch:main Apr 19, 2024
27 checks passed
@kartikayk
Copy link
Contributor

One note, we do double are RoPE cache. But I dont think this will meaningfully impact training.

@rohan-varma
Copy link
Member

@kartikayk Is there a way to prevent this that's used in pretraining or other finetuning libs that we don't have? Seems our implementation makes RoPE directly proportional to max_seq_len, right? https:/pytorch/torchtune/blob/main/torchtune/modules/position_embeddings.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants