Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaption for moe models #2101

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

adaption for moe models #2101

wants to merge 23 commits into from

Conversation

dhrhank187
Copy link

Dear huggingface peft community,

We have adapted the MoE model based on Megatron's RowParallelLinear and ColumnParallelLinear by modifying Loraparallellinear. Additionally, we have validated the Mixtral model. We would greatly appreciate your review and feedback to further improve and refine our work. Looking forward to your suggestions and comments!

Thank you for your support and collaboration!

@dhrhank187
Copy link
Author

@BenjaminBossan @pacman100

@BenjaminBossan
Copy link
Member

Could you please give more context, what are you referring to exactly and where is this new parameter being used?

Also, as is, this PR assumes that the base layer always has the is_expert attribute and that RowParallelLinear and ColumnParallelLinear always accept it as an argument. I don't think we can make these assumptions.

@dhrhank187
Copy link
Author

https:/NVIDIA/Megatron-LM/blob/main/megatron/core/tensor_parallel/layers.py

Thanks for your comment.

  1. The is_expert is a fixed parameter based on ColumnParallelLinear and RowParallelLinear.
  2. The base_layer is created based on ColumnParallelLinear and RowParallelLinear, so base_layer also has the is_expert parameter.
  3. When using the MoE model, this parameter is not enabled, it will lead to a mismatch between the shape of x and the shape of result in the forward function.

image
image
image

@BenjaminBossan
Copy link
Member

Ah I see, thanks for the pointers. So this was added to megatron more than a year ago, so I guess it should be fine, but I'm not sure if users may want to use other backends that don't have that parameter. Hopefully @zhangsheng377 can comment on this.

@zhangsheng377
Copy link
Contributor

The parameter is_expert should be newly added to megatron this year, right? I think that although we support custom backends, the default format should still be based on megatron, that is, the user's own backend should be compatible with the megatron interface. So I agree to add the is_expert parameter, but it would be better to elaborate on the lora parameter.

After transformers merged this PR:

huggingface/transformers#33703

The bool of past_key_values (a Cache instance) would change from False
to True in one of our checks. Use get_seq_length() method instead, which
is consistent before and after that commit.

I checked the tests with the new change for both transformers before and
after that commit and they passed, so this change should be backwards
compatible.

Unrelated change: Mark X-LoRA scaling test as xfail-ing for now.

This should be addressed in a separate PR. Marking it to xfail for now
to get the original fix through CI.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

@dhrhank187 could you please merge with/rebase on main branch? That should fix the failing CI.

Zeju1997 and others added 19 commits October 1, 2024 16:51
The previous OFT implementation contained a few errors, which are fixed now.

Unfortunately, this makes previous OFT checkpoints invalid, which is why an
error will be raised. Users are instructed to either retrain the OFT adapter or
switch to an old PEFT version.
Resolves huggingface#2099

So far, if a module was wrapped due to modules_to_save, we handled
access to the weight and bias attribute (albeit incorrectly in case of
disabled adapters!). However, there could be more attributes than those
that could be accessed, in which case we got an error so far.

Instead of special properties, we now implement a generic __getattr__
method that can deal with any attribute. The implementation is a bit
complex to take into account the way that torch.nn.Module handles
__getattr__.
See: huggingface/diffusers#9510 (comment)

Right now, the low_cpu_mem_usage=True option does not consolidate the
devices. E.g. when the model is on GPU and the state_dict on CPU, the
adapter weight will be on CPU after loading, when it should be GPU. This
fix ensures that the devices are consolidated.
Currently, CI is failing constantly because one of the X-LoRA tests has
become flaky lately, most likely caused by the transformers 4.45.0
release. Therefore, this test is now marked to non-strictly xfail.

I cannot reproduce this error locally, neither on CPU nor GPU. It is
thus unclear how to fix this test.
After merging huggingface#2084, we now clean up the missing_keys when loading a
PEFT adapter to remove all but the relevant keys (the fact that base
model keys are missing is expected when loading a PEFT adapter).

Since the presence of missing_keys now really means that something might
have gone wrong during loading, we can now warn the user if they call
PeftModel.from_pretrained.

Note that load_adapter still does not warn, as here we return the
load_result and users can already check, but for from_pretrained, they
don't have that possibility.
…gface#2076)

VeRA can now be used with 4bit and 8bit bnb quantization.
Supports torch AO quantization. Currently supported:

- int8_weight_only
- int8_dynamic_activation_int8_weight

---------

Co-authored-by: Marc Sun <[email protected]>
…ce#2104)

Transpose weight matrix based on fan_in_fan_out condition in PiSSA
initialization.

Co-authored-by: Yang Su <[email protected]>
The file was missing the from __future__ import annotations part. As
this code is only running nightly with GPU, the normal CI missed this
omission.
Right now, loading a PEFT config saved with a more recent PEFT version
than is currently installed will lead to errors when new arguments are
added to the config in the newer PEFT version. The current workaround is
for users to manually edit the adapter_config.json to remove those
entries.

With this PR, PEFT will make an attempt at removing these unknown keys
by inspecting the signature. The user will be warned about these removed
keys. This should generally be a safe measure because we will generally
not introduce new config settings that change the default behavior.
However, if a non-default is used, this could lead to wrong results.
This is mentioned in the warning.

While working on the tests, I also converted the unittest.TestCase to a
normal pytest test in order to be able to use pytest fixtures.

I also plan on adding the PEFT version to the adapter_config.json in the
future. This will allow us to better handle compatibility issues in the
future. As adding that new key to all PEFT configs could cause a lot of
disruption, I want to get this PR in first to ensure forward
compatibility.

Note that this new mechanism will not help anyone using a PEFT version
< 0.14.0, so this will be a slow transition.
PEFT allows mixed batch adapter inference, i.e. when predicting, the
same batch can use different adapters by passing the adapter_names
argument. However, when users pass an adapter name that does not
correspond to any of the existing adapters, these samples are currently
being ignored (i.e. just the base model output is used). This is
unexpected and can easily lead to errors, e.g. when users mistype the
name of an adapter.

This PR fixes this issue by checking all the existing adapter names
first and comparing them to the adapter_names that the user passed. If
there are unexpected entries, an error is raised.

Due to this fix, an error in the test
test_mixed_adapter_batches_lora_merged_raises was discovered and
promptly fixed.
The error in PEFT is occurring after this transformers change:

huggingface/transformers#33870

Now, in our tests, some model_kwargs no longer necessarily contain
past_key_values, resulting in a KeyError. We now account for this
possibility. Affected models were opt and gpt2.
This test calculates the correlation coefficient of HQQ model outputs.
Although the model outputs are finite, the resulting matrix contains
NaNs. Casting the outputs from 16 to 32 bit precision resolves the
issue.
Solves the following bug:

huggingface/diffusers#9622 (comment)

The cause for the bug is as follows: When we have, say, a module called
"bar.0.query" that we want to target and another module called
"foo_bar.0.query" that we don't want to target, there was potential for
an error. This is not caused by _find_minimal_target_modules directly,
but rather the bug was inside of BaseTuner.inject_adapter and how the
names_no_target were chosen. Those used to be chosen based on suffix. In
our example, however, "bar.0.query" is a suffix of "foo_bar.0.query",
therefore "foo_bar.0.query" was *not* added to names_no_target when it
should have. As a consequence, during the optimization, it looks like
"query" is safe to use as target_modules because we don't see that it
wrongly matches "foo_bar.0.query".
After the patch release of PEFT v0.13.2, let's bump the dev version of
PEFT to v0.13.3.dev0 so that it stays ahead (the bugfix from the patch
release is already contained in the main branch).
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dhrhank187
Copy link
Author

@BenjaminBossan hello I have merged a new branch for fixing the CI.

@BenjaminBossan
Copy link
Member

@dhrhank187 For some reason, the latest change resulted in a huge diff with 56 files being touched. I think if you rebase on main, this should be fixed. Could you please do that, otherwise I can't review the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants