Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix passing of FSDP arg to LoRA recipe test #472

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

ebsmothers
Copy link
Contributor

Context

  • Our current test_lora_finetune.py is not passing the enable_fsdp flag to the recipe correctly. It is also using the wrong context manager (swapping null context manager for single box init)

Changelog

  • Fix the config and swap the context managers back.

Test plan

First: add import pdb; pdb.set_trace() at the first line of both test_loss and LoRAFinetuneRecipe.__init__.

Before this change:

pytest -v tests/recipes/test_lora_finetune.py -k 'test_loss'
...
tests/recipes/test_lora_finetune.py::TestLoRAFinetuneRecipe::test_loss[False] 
> /data/users/ebs/ebs-torchtune/tests/recipes/test_lora_finetune.py(54)test_loss()
-> ckpt = "lora_small_test_ckpt"
(Pdb) enable_fsdp
False
(Pdb) c
> /data/users/ebs/torchtune/recipes/lora_finetune.py(76)__init__()
-> self._device = utils.get_device(device=cfg.device)
(Pdb) cfg.enable_fsdp
False 
(Pdb) c

tests/recipes/test_lora_finetune.py::TestLoRAFinetuneRecipe::test_loss[True] 
> /data/users/ebs/ebs-torchtune/tests/recipes/test_lora_finetune.py(54)test_loss()
-> ckpt = "lora_small_test_ckpt"
(Pdb) enable_fsdp
True
(Pdb) c
> /data/users/ebs/torchtune/recipes/lora_finetune.py(76)__init__()
-> self._device = utils.get_device(device=cfg.device)
(Pdb) cfg.enable_fsdp
False # THIS LINE IS INCORRECT

After this change:

pytest -v tests/recipes/test_lora_finetune.py -k 'test_loss'
...
tests/recipes/test_lora_finetune.py::TestLoRAFinetuneRecipe::test_loss[False] 
> /data/users/ebs/ebs-torchtune/tests/recipes/test_lora_finetune.py(54)test_loss()
-> ckpt = "lora_small_test_ckpt"
(Pdb) enable_fsdp
False
(Pdb) c
> /data/users/ebs/torchtune/recipes/lora_finetune.py(76)__init__()
-> self._device = utils.get_device(device=cfg.device)
(Pdb) cfg.enable_fsdp
False 
(Pdb) c

tests/recipes/test_lora_finetune.py::TestLoRAFinetuneRecipe::test_loss[True] 
> /data/users/ebs/ebs-torchtune/tests/recipes/test_lora_finetune.py(54)test_loss()
-> ckpt = "lora_small_test_ckpt"
(Pdb) enable_fsdp
True
(Pdb) c
> /data/users/ebs/torchtune/recipes/lora_finetune.py(76)__init__()
-> self._device = utils.get_device(device=cfg.device)
(Pdb) cfg.enable_fsdp
True # Now it's correct
pytest -v tests/recipes/test_lora_finetune.py -k 'test_loss'
...
========== 2 passed, 5 warnings in 15.53s ===========

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2024
Copy link

netlify bot commented Mar 8, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit aed752d
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65eb69c4df36fa00081224c7
😎 Deploy Preview https://deploy-preview-472--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rohan-varma
Copy link
Member

Discussed with @ebsmothers offline that there are still a couple of issues - the TL;DR is that we should test our distributed recipes via tune --nproc-per-node x instead of single_box_init which is the way users would launch them. Fixing this in #454

@ebsmothers ebsmothers merged commit 15054ef into pytorch:main Mar 8, 2024
17 checks passed
@ebsmothers ebsmothers deleted the fix-lora-recipe-fsdp-test branch March 8, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants