Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError running HF model "mlfoundations/dclm-7b-it" for inference #303

Open
neginraoof opened this issue Aug 27, 2024 · 1 comment

Comments

@neginraoof
Copy link

I am trying to use the HF model "mlfoundations/dclm-7b-it" for inference, simply using the code below:

model = AutoModelForCausalLM.from_pretrained("mlfoundations/dclm-7b-it")
gen_kwargs = {"max_new_tokens": 500, "temperature": 0}
output = model.generate(inputs['input_ids'], **gen_kwargs)

I see this warning when loading the model:
Some weights of OpenLMForCausalLM were not initialized from the model checkpoint at mlfoundations/dclm-7b-it and are newly initialized: [...]

And I get NotImplementedError:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 3, 32, 128) (torch.float32)
     key         : shape=(1, 3, 32, 128) (torch.float32)
     value       : shape=(1, 3, 32, 128) (torch.float32)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
     p           : 0.0

I have also tried model = AutoModel.from_pretrained("mlfoundations/dclm-7b-it"), but this model class also fails with ValueError: Unrecognized configuration class.

Which model class should I use here?

@neginraoof neginraoof changed the title NotImplementedError exception when running the HF model "mlfoundations/dclm-7b-it" for inference NotImplementedError running HF model "mlfoundations/dclm-7b-it" for inference Aug 27, 2024
@sedrick-keh-tri
Copy link
Collaborator

This is usually an xformers issue. I think the main issue is that xformers doesn't run on CPU, so the quick short-term fix is to make sure you send all your models and tensors to device/GPU. That should resolve the issue.

I think the long-term solution here would probably be to get rid of xformers entirely. You can do this locally by setting "attn_name": "torch_attn" and "ffn_type": "swiglu_torch". I know the Apple models and TRI models do this, but I guess the mlfoundations one wasn't updated accordingly. I'm putting in a PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants