Running model parallel Inference #88

SupreethRao99 · 2023-03-03T16:43:46Z

I am trying to run inference on the 7B parameter model on 4x2080Ti, the default script to run inference gives me a CUDA OOM error. is there a way to split the model across multiple GPU's and perform inference.

Thank You!

fabawi · 2023-03-03T23:07:46Z

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here:

https:/modular-ml/wrapyfi-examples_llama

and have a readme with the instructions on how to do it:

LLaMA with Wrapyfi

Wrapyfi enables distributing LLaMA (inference only) on multiple GPUs/machines, each with less than 16GB VRAM

currently distributes on two cards only using ZeroMQ. Will support flexible distribution soon!

This approach has only been tested on 7B model for now, using Ubuntu 20.04 with two 1080 Tis. Testing 13B/30B models soon!
UPDATE: Tested on Two 3080 Tis as well!!!

How to?

Replace all instances of <YOUR_IP> and before running the scripts
Download LLaMA weights using the official form below and install this wrapyfi-examples_llama inside conda or virtual env:

git clone https:/modular-ml/wrapyfi-examples_llama.git
cd wrapyfi-examples_llama
pip install -r requirements.txt
pip install -e .

Install Wrapyfi with the same environment:

git clone https:/fabawi/wrapyfi.git
cd wrapyfi
pip install .[pyzmq]

Start the Wrapyfi ZeroMQ broker from within the Wrapyfi repo:

cd wrapyfi/standalone 
python zeromq_proxy_broker.py --comm_type pubsubpoll

Start the first instance of the Wrapyfi-wrapped LLaMA from within this repo and env (order is important, dont start wrapyfi_device_idx=0 before wrapyfi_device_idx=1):

CUDA_VISIBLE_DEVICES="0" OMP_NUM_THREADS=1 torchrun --nproc_per_node 1 example.py --ckpt_dir <YOUR CHECKPOINT DIRECTORY>/checkpoints/7B --tokenizer_path <YOUR CHECKPOINT DIRECTORY>/checkpoints/tokenizer.model --wrapyfi_device_idx 1

Now start the second instance (within this repo and env) :

CUDA_VISIBLE_DEVICES="1" OMP_NUM_THREADS=1 torchrun --master_port=29503 --nproc_per_node 1 example.py --ckpt_dir <YOUR CHECKPOINT DIRECTORY>/checkpoints/7B --tokenizer_path <YOUR CHECKPOINT DIRECTORY>/checkpoints/tokenizer.model --wrapyfi_device_idx 0

You will now see the output on both terminals
EXTRA: To run on different machines, the broker must be running on a specific IP in step 4. Start the ZeroMQ broker by setting the IP and provide the env variables for steps 5+6 e.g.,

### (replace 10.0.0.101 with <YOUR_IP> ###

# step 4 modification 
python zeromq_proxy_broker.py --socket_ip 10.0.0.101 --comm_type pubsubpoll

# step 5 modification
CUDA_VISIBLE_DEVICES="0" OMP_NUM_THREADS=1 WRAPYFI_ZEROMQ_SOCKET_IP='10.0.0.101' torchrun --nproc_per_node 1 example.py --ckpt_dir <YOUR CHECKPOINT DIRECTORY>/checkpoints/7B --tokenizer_path <YOUR CHECKPOINT DIRECTORY>/checkpoints/tokenizer.model --wrapyfi_device_idx 1

# step 6 modification
CUDA_VISIBLE_DEVICES="1" OMP_NUM_THREADS=1 WRAPYFI_ZEROMQ_SOCKET_IP='10.0.0.101' torchrun --master_port=29503 --nproc_per_node 1 example.py --ckpt_dir <YOUR CHECKPOINT DIRECTORY>/checkpoints/7B --tokenizer_path <YOUR CHECKPOINT DIRECTORY>/checkpoints/tokenizer.model --wrapyfi_device_idx 0

jspisak · 2023-09-06T17:31:32Z

this is something we should cover in the llama-recipes repo. Thanks for raising! cc @HamidShojanazeri

generic-username0718 mentioned this issue Mar 4, 2023

Support for LLaMA models oobabooga/text-generation-webui#147

Closed

albertodepaola added documentation Improvements or additions to documentation compatibility issues arising from specific hardware or system configs labels Sep 6, 2023

jspisak closed this as completed Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running model parallel Inference #88

Running model parallel Inference #88

SupreethRao99 commented Mar 3, 2023

fabawi commented Mar 3, 2023

jspisak commented Sep 6, 2023

Running model parallel Inference #88

Running model parallel Inference #88

Comments

SupreethRao99 commented Mar 3, 2023

fabawi commented Mar 3, 2023

LLaMA with Wrapyfi

How to?

jspisak commented Sep 6, 2023