ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!") #10

Tianwei-She · 2022-08-15T21:08:02Z

Hi Tim,

Thanks for your awesome work!

I'm using your method to load the largest BLOOM model (the BLOOM model with 176b parameters) onto 1 node with 8 GPUs.

model = AutoModelForCausalLM.from_pretrained(
                "bloom", 
                device_map="auto", 
                load_in_8bit=True,
            )

This line works for all the other smaller bloom models, eg. bloom-7b1. However when loading bloom (176b) I got error "8-bit operations on bitsandbytes are not supported under CPU!".

File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2182, in from_pretrained
    raise ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")
ValueError: 8-bit operations on `bitsandbytes` are not supported under CPU!

In my understanding, this is because some modules of the model are automatically loaded onto CPU, which didn't happen to the smaller models. Is there a way to force the model to be loaded to GPU only? or do you have any advice on how to bypass this error? Thanks!!

Tianwei

The text was updated successfully, but these errors were encountered:

aninrusimha · 2022-08-16T19:59:44Z

From my testing it seems the following happens when not enough memory is available on GPU:
hf accelerate automatic device selection sees device_map = auto, and puts some layers on CPU
this device map with cpu layers is passed onward
the bnb code in hf transformers sees the cpu layers and raises this confusing error message.
My guess is that you lack enough GPU memory for bloom.

younesbelkada · 2022-08-16T22:19:05Z

Hi @aninrusimha @Tianwei-She
I second what @aninrusimha said, this error is thrown when you don't have enough GPU RAM to fit your quantized model before trying to assign it on the correct GPU device.
Could you also tell us what type of GPU are you using?

Tianwei-She · 2022-08-16T23:25:16Z

thanks for the reply! I'm using an AWS g5.48xlarge instance which has 192GiB GPU memory

younesbelkada · 2022-08-16T23:30:43Z

Actually I am a bit surprised it didn't fit your GPUs. Since I don't have access these machines,
Could you please try to install transformers on dev mode - aka:

git clone https:/huggingface/transformers
cd transformers
pip install -e ".[dev]"

And then add

print(device_map)

Just before this line: https:/huggingface/transformers/blob/6d175c1129538b27230be170fc1184e8490e95ef/src/transformers/modeling_utils.py#L2181

Also could you point me to the exact commands (better send me maybe the full script) you are using? Thanks

TimDettmers · 2022-08-17T03:24:51Z

I believe the main issue here is that you need to use the max_memory dictionary as an argument. By default, it can be that the dictionary allocates too much memory for the model, such that the mini-batch no longer fits onto the GPU. This then causes a CPU error.

Either decrease mini-batch size and sequence length until it fits, or use a max_memory dictionary which leaves a couple of GB of memory free on each GPU. So if you have 24 GB of memory per GPU, you want to use 22-23 GB only. However, BLOOM-176B might not fit with 22GB, and you need slightly more, something like 22.5GB, but I am not sure if floating point values are supported for the max_memory dictionary. @younesbelkada do you know more?

Tianwei-She · 2022-08-23T06:20:07Z

Thanks for replying!

@younesbelkada I printed out the device_map, there are indeed some modules not on GPU - 'transformer.h.69': 'disk', 'transformer.ln_f': 'disk'

{'transformer.word_embeddings': 0, 'lm_head': 0, 'transformer.word_embeddings_layernorm': 0, 'transformer.h.0': 0, 'transformer.h.1': 0, 'transformer.h.2': 0, 'transformer.h.3': 0, 'transformer.h.4': 0, 'transformer.h.5': 0, 'transformer.h.6': 1, 'transformer.h.7': 1, 'transformer.h.8': 1, 'transformer.h.9': 1, 'transformer.h.10': 1, 'transformer.h.11': 1, 'transformer.h.12': 1, 'transformer.h.13': 1, 'transformer.h.14': 1, 'transformer.h.15': 2, 'transformer.h.16': 2, 'transformer.h.17': 2, 'transformer.h.18': 2, 'transformer.h.19': 2, 'transformer.h.20': 2, 'transformer.h.21': 2, 'transformer.h.22': 2, 'transformer.h.23': 2, 'transformer.h.24': 3, 'transformer.h.25': 3, 'transformer.h.26': 3, 'transformer.h.27': 3, 'transformer.h.28': 3, 'transformer.h.29': 3, 'transformer.h.30': 3, 'transformer.h.31': 3, 'transformer.h.32': 3, 'transformer.h.33': 4, 'transformer.h.34': 4, 'transformer.h.35': 4, 'transformer.h.36': 4, 'transformer.h.37': 4, 'transformer.h.38': 4, 'transformer.h.39': 4, 'transformer.h.40': 4, 'transformer.h.41': 4, 'transformer.h.42': 5, 'transformer.h.43': 5, 'transformer.h.44': 5, 'transformer.h.45': 5, 'transformer.h.46': 5, 'transformer.h.47': 5, 'transformer.h.48': 5, 'transformer.h.49': 5, 'transformer.h.50': 5, 'transformer.h.51': 6, 'transformer.h.52': 6, 'transformer.h.53': 6, 'transformer.h.54': 6, 'transformer.h.55': 6, 'transformer.h.56': 6, 'transformer.h.57': 6, 'transformer.h.58': 6, 'transformer.h.59': 6, 'transformer.h.60': 7, 'transformer.h.61': 7, 'transformer.h.62': 7, 'transformer.h.63': 7, 'transformer.h.64': 7, 'transformer.h.65': 7, 'transformer.h.66': 7, 'transformer.h.67': 7, 'transformer.h.68': 7, 'transformer.h.69': 'disk', 'transformer.ln_f': 'disk'}

@TimDettmers
I've added max_memory as an argument, even with 23GB max_memory I'm still getting the error. The code I ran is

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

free_in_GB = int(torch.cuda.mem_get_info()[0]/1024**3)
# max_memory = f'{free_in_GB-2}GB'
# max_memory = f'{free_in_GB}GB'
max_memory = f'23GB'
n_gpus = torch.cuda.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}
print(max_memory)

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="auto", load_in_8bit=True, max_memory=max_memory)

torch.cuda.mem_get_info()[0]/1024**3 is 21.5, nvidia-smi shows each GPU has 23028MiB memory.

I understand this is most likely caused by insufficient GPU memory, however I'm wondering how BLOOM model was able to be run on 8x RTX 3090 with 24GB memory as shown in the paper

Tianwei-She · 2022-08-23T06:29:22Z

@TimDettmers btw I also tried tuning the parameter int8_threshold, with int8_threshold = 0, the memory usage is the same as the default int8_threshold = 0.6. Just wanted to confirm, is this expected? Thanks again for your help!

TimDettmers · 2022-09-05T22:16:25Z

it is as expected that threshold 0 and 6 use the close to same memory with the current implementation. The difference should be in the order of a couple of megabytes.

If you are still receiving an error, you can try to tweak the exact amounts of memory reserved for the model and the activations. You might want to use between max_memory=22016MB (21.5 GB) and max_memory=22784MB (22.25 GB) which leaves the rest of the memory for the activations.

What is also important in this case is this max memory used for activations during inference. If your sequence dimension during inference is high, you might run out of memory at some point because the margins are so small.

In that case, you need to retweak the max_memory parameters. It could also help to remove the caching from the model, but I am not sure how to do that.

TimDettmers · 2022-10-27T14:15:44Z

I am closing this as this issue is related to a part of the model being on the CPU which is currently managed by the accelerate library. If this is still relevant, please open an issue there.

Regarding the BLOOM model, I will try to debug the situation and post examples to run BLOOM in a setup similar to yours.

…ize_64 Remove blocksize 64 for quant/dequant functions

younesbelkada mentioned this issue Aug 16, 2022

[bnb] Small improvements on utils huggingface/transformers#18646

Merged

TimDettmers mentioned this issue Sep 5, 2022

Cannot load it with T5 - RTX 5000, Cuda 11.3 #16

Closed

TimDettmers added bug Something isn't working documentation Improvements or additions to documentation labels Sep 5, 2022

TimDettmers closed this as completed Oct 27, 2022

techthiyanes pushed a commit to techthiyanes/bitsandbytes-1 that referenced this issue Jul 7, 2023

Added AdamW. bitsandbytes-foundation#10 bitsandbytes-foundation#13

2f8083b

CRyan2016 mentioned this issue Jul 16, 2023

terminate called after throwing an instance of 'c10::Error' #597

Closed

TNTran92 pushed a commit to TNTran92/bitsandbytes that referenced this issue Mar 24, 2024

Merge pull request bitsandbytes-foundation#10 from ROCm/remove_blocks…

9890d5d

…ize_64 Remove blocksize 64 for quant/dequant functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!") #10

ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!") #10

Tianwei-She commented Aug 15, 2022

aninrusimha commented Aug 16, 2022

younesbelkada commented Aug 16, 2022 •

edited

Loading

Tianwei-She commented Aug 16, 2022

younesbelkada commented Aug 16, 2022

TimDettmers commented Aug 17, 2022

Tianwei-She commented Aug 23, 2022 •

edited

Loading

Tianwei-She commented Aug 23, 2022

TimDettmers commented Sep 5, 2022

TimDettmers commented Oct 27, 2022

ValueError("8-bit operations on bitsandbytes are not supported under CPU!") #10

ValueError("8-bit operations on bitsandbytes are not supported under CPU!") #10

Comments

Tianwei-She commented Aug 15, 2022

aninrusimha commented Aug 16, 2022

younesbelkada commented Aug 16, 2022 • edited Loading

Tianwei-She commented Aug 16, 2022

younesbelkada commented Aug 16, 2022

TimDettmers commented Aug 17, 2022

Tianwei-She commented Aug 23, 2022 • edited Loading

Tianwei-She commented Aug 23, 2022

TimDettmers commented Sep 5, 2022

TimDettmers commented Oct 27, 2022

ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!") #10

ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!") #10

younesbelkada commented Aug 16, 2022 •

edited

Loading

Tianwei-She commented Aug 23, 2022 •

edited

Loading