-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add accelerate
support for ViLT
#18683
Add accelerate
support for ViLT
#18683
Conversation
- redefine `accelerate` tests by picking the correct model output - redefined tests tolerance due to stochasticity - slow tests are passing except `iltModelIntegrationTest::test_inference_natural_language_visual_reasoning` - But the test above seem to never pass anyway
Could you elaborate this part in more depth, please? |
Sure, |
The documentation is not available anymore as the PR was closed or merged. |
Thank you, @younesbelkada Let me bother you a bit more: why |
No worries! |
Should still be deterministic from my intuition, let me have a look |
After looking into it with @ArthurZucker fixing a manual seed on each accelerate test before each forward pass seems to fix the non passing tests. |
I am open to the fix of using seed. It makes the change smaller. https:/huggingface/transformers/runs/7891383348?check_suite_focus=true The nightly CI you mentioned above uses nightly torch version (i.e. daily built torch) for which more test failures are expected. |
I think it's better for us to figure out why the relevant tests don't fail on scheduled CI for a long period, but here it fails. This seems contradictory. |
Could you mention on which hardware you tested, @younesbelkada ? |
- replace everything by manually setting the seed
For the
|
The non passing test pass on my VM with |
@@ -530,7 +668,8 @@ def test_for_token_classification(self): | |||
|
|||
# We will verify our results on an image of cute cats | |||
def prepare_img(): | |||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydshieh maybe it's because of this line? Is ./tests/fixtures/tests_samples/COCO/000000039769.png
stored in the Docker image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the repo, no? The CI clones the repo., so it should be there.
Thanks for the _no_split_modules
not set part, now clear to me! So good for me to use seed for those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great my bad then, reverted that change. However this seems to not fix the initial issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably hardware / library version related, I saw somewhere that different PIL version can yield to different results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check here:
if is_pillow_less_than_9: |
My Pillow version is:
Pillow 9.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just set the seed, then call super().test_cpu_offload()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I am worried is that you need to set a seed before each inference step, so more than 2 times. Once before the first forward pass and the second time before the other forward passes that is why I think that we cannot call super().test_cpu_offload()
but have to define our own function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a way to fix a seed without having to re-set it at each inference step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to add a context manager at 3cda114
with that there is no need to redefine the whole testing functions. We just need a decorator set_reproducible
to be set on these functions to ensure reproducibility
src/transformers/testing_utils.py
Outdated
|
||
|
||
# adapted from https://stackoverflow.com/questions/32163436/python-decorator-for-printing-every-line-executed-by-a-function | ||
class set_reproducible(ContextDecorator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, does that mean we can use this decorator every time we want to test PyTorch models that involve stochasticity? Cause ViTMAE is another example (as it creates a random boolean mask inside).
I'm currently using torch.manual_seed
as seen here. Should we use this decorator instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change could be well applied to ViTMAE
- they have exactly the same stochasticity situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work for the mentioned test 🥳
Thanks a lot for your comments @ydshieh & @NielsRogge @NielsRogge : I can confirm this decorator works fine for ViTMAE too, I replaced the function you pointed me by:
And the test was passing so I guess we can safely replace stochastic tests with this kind of trick I would love to have a quick review from @sgugger or @stas00 if possible 🙏 As I think that this decorator can be useful for future models Thanks 💪 |
Just a small comment, in terms of performances I think the decorator can be a little bit improve to only run on the model's forward and not on every single forward pass (if I understand correctly here all the functions named |
Improve context manager's efficiency by filtering the forward calls base on the file origin
I can confirm that it now fixes the tests for |
src/transformers/testing_utils.py
Outdated
|
||
|
||
# adapted from https://stackoverflow.com/questions/32163436/python-decorator-for-printing-every-line-executed-by-a-function | ||
class set_reproducible(ContextDecorator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work for the mentioned test 🥳
src/transformers/testing_utils.py
Outdated
|
||
def trace_calls(self, frame, event, arg): | ||
# Set the seed when it is a call to a forward function from the model | ||
if "/modeling_" in frame.f_code.co_filename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bit tricky to get, if someone knows where to find the documentation to link to the frame
attribute , that would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the frames looked a bit like that :
<frame at 0x564b14f28360, file '/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py', line 761, code forward>
<code object forward at 0x7fe2c0610f50, file "/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py", line 761>
/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py
<frame at 0x564b14d71fe0, file '/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py', line 200, code forward>
<code object forward at 0x7fe2c060d0e0, file "/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py", line 200>
/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py
<frame at 0x564b1432e910, file '/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py', line 265, code forward>
<code object forward at 0x7fe2c060d2f0, file "/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py", line 265>
/home/arthur_huggingface_co/transformers/src/transformers/models/vilt/modeling_vilt.py
<frame at 0x564b14ce68b0, file '/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py', line 157, code forward>
<code object forward at 0x7fe40cd5dc90, file "/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 157>
/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py
<frame at 0x564b14ce68b0, file '/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py', line 157, code forward>
<code object forward at 0x7fe40cd5dc90, file "/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 157>
/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py
<frame at 0x564b14ce68b0, file '/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py', line 157, code forward>
<code object forward at 0x7fe40cd5dc90, file "/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 157>
/opt/conda/envs/jukebox/lib/python3.8/site-packages/torch/nn/modules/sparse.py
small nit Co-authored-by: Younes Belkada <[email protected]>
src/transformers/testing_utils.py
Outdated
def trace_calls(self, frame, event, arg): | ||
# Set the seed when it is a call to a forward function from the model | ||
if "/modeling_" in frame.f_code.co_filename: | ||
return self.set_seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it correctly, you're using this hack to force the same seed before every python call, correct?
What is it that you're testing then, since this is not the normal behavior, i.e. normal code base sets the seed once to be reproducible.
In other words why setting the seed at the beginning of the program doesn't lead to a reproducible outcome? Isn't that an indicator of a bug in the proposed software and this hack tries to hide it?
and if you're really going to keep this can we call it something with a better mnemonic like with reset_seed_on_every_frame
? Since set_reproducible
to me means setting a seed once at the enter and leaving it alone.
and kudos on finding a very cool way to hack this functionality in, @younesbelkada! I'm just not sure it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stas00 for your reply!
Regarding your first point yes but, we set the seed before each forward
call from a Pytorch module instead of any python call (from @ArthurZucker 's modification).
Regarding your second point, I was also very confused at the beginning but setting the pytorch seed before each forward
seems to be needed to ensure reproducibility for stochastic operations. E.g. this code snippet shows that:
>>> import torch
>>> torch.manual_seed(0)
<torch._C.Generator object at 0x10576b6f0>
>>> torch.randn(1, 3)
tensor([[ 1.5410, -0.2934, -2.1788]])
>>> torch.randn(1, 3)
tensor([[ 0.5684, -1.0845, -1.3986]])
>>> torch.manual_seed(0)
<torch._C.Generator object at 0x10576b6f0>
>>> torch.randn(1, 3)
tensor([[ 1.5410, -0.2934, -2.1788]])
You need to set the seed before each stochastic call to ensure reproducibility. From my understanding that is why we need to call the seed setting on stochastic tests (e.g. here where the seed is called before each forward pass). I am still not sure why calling the seed once is not sufficient but I think that it might be because the seed has been 'consumed' when a stochastic operation is performed (This has been also observed on Jukebox integration I think).
For the last point yes! I think that we should change the decorator's name to make it more understandable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that nuance about forward
- thank you for explaining.
Then reset_seed_before_every_forward
would be more appropriate.
But there is already a mechanism for doing this - this is called register_forward_hook
https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html?highlight=hook#torch.jit.ScriptModule.register_forward_hook
so you don't need any cool hacks to accomplish that.
e.g. see:
transformers/src/transformers/debug_utils.py
Line 160 in 4c2e983
self.register_forward_hook() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting, both option work, and maybe register_forward_pre_hook(hook)
is more appropriate.
IMO having a decorator will allow anyone to use it without modifying their test function, but we might have a lot of decorators.
We could define the following
# testing_utils.py
def pre_hook_set_seed(seed):
def pre_hook(module, input):
set_seed(seed)
return pre_hook
And call
# test_modeling_vilt.py
handle = self.register_forward_pre_hook(pre_hook_set_seed)
...
handle.remove()
I guess its pretty similar but requires a bit more code. 😄
@stas00 What do you think would work best long term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pytorch call is designated for this specific functionality so it's a much cleaner solution, IMHO.
You can just as well implement it as a context manager so enter/exit for the above sample.
Also I was thinking that if ViLT needs this functionality often, perhaps the hook should be added directly into its modeling file?
But testing_utils is perfect too if it's generic and other models might need it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit overengineered for three tests where we could just insert two set_seed
calls to ensure the same results (even registering a pre-foward hook seems over-engineered TBH). Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right 😅 But since VitMae
also requires that hack, we though that long term, made more sense to have an easy to use decorator! You have to insert the set_seed
before every forward
(so in the for loop that is testing) for each of the test so the original idea to have a decorator followed Yih-dar's comment.
Anyway will do as you think works better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was excited by the idea at first, reading Stas' comments and thinking more about it, I think we should stick to something simple :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be a test decorator? what would be wrong with calling it from the test?
Also I was thinking that if this might be used by users - e.g. to debug their code, won't it be a good idea to make this functionality as part of the model itself?
So basically one would call:
model = from_pretrained(...)
model.reset_seed_before_every_forward(seed=42) # this will inject the prehook
and so now it's accessible to tests and users too. So it's no longer a hack to solve testing issues, but an official debug feature.
If multiple models require that it could be a mixin class and they can inherit from it, so it won't make any noise in the modeling file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fine with this.
- remove `reset_seed_before_every_forward` context manager - redefine accelerate tests
Thank you all for your comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the overloaded tests are just adding two lines for setting the seed. That modification can definitely goo in the base test (sorry if I was unclear in my comments before) :-)
- add the attribute `reset_seed_before_every_forward` on `accelerate` tests - this supports stochastic tests
@sgugger thanks for your comment! And sorry for my late reply I can also take care of opening a follow-up PR to update the tests of ViTMAE to make sure these changes are consistent for stochastic models tests inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean adding a new argument to the test. We can have the seed set in the tests for all models. :-)
Co-authored-by: Sylvain Gugger <[email protected]>
- set seed before every forward on `accelerate` tests - `make fixup`
Perfect thanks! @sgugger |
@require_accelerate | ||
@require_torch_gpu | ||
def test_cpu_offload(self): | ||
super().test_cpu_offload() | ||
|
||
@require_accelerate | ||
@require_torch_gpu | ||
def test_disk_offload(self): | ||
super().test_disk_offload() | ||
|
||
@require_accelerate | ||
@require_torch_multi_gpu | ||
def test_model_parallelism(self): | ||
super().test_model_parallelism() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this is needed anymore ;-p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yes ahah forgot to remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed in 19515d2
- tests will be correctly called from the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Motivation
Add
bnb
support for ViLT model as it has been asked by a user in bitsandbytes-foundation/bitsandbytes#14. This involved adding
accelerate
support for this model.What does this PR do?
Adds
_no_split_modules
attribute atViltModel
class to support loading the model withdevice_map=auto
. This also implied adding a.to
operation insideViltLayer
.I also redefined
accelerate
tests since for this model the hidden states are not deterministic. However, it is possible to check the correctness of the operation by checking some output attributes such aslogits
orpooler_output
.Questions
The test
ViltModelIntegrationTest::test_inference_natural_language_visual_reasoning
seem to never pass on my machine (aka even without_no_split_modules
), is it related to something I am missing? Also it seems that those tests were failing too on the nightly run: https:/huggingface/transformers/runs/7882898294?check_suite_focus=truecc @NielsRogge @ArthurZucker @ydshieh