Llama 3.1 fine-tune deployment error #3443

sorenmat · 2024-08-20T13:02:19Z

Expected Behavior

To be able to deploy the example notebook without modifications

Actual Behavior

Deploying llama3-1-vllm-serve-20240820-120605 on g2-standard-12 with 1 NVIDIA_L4 GPU(s).

---------------------------------------------------------------------------

FailedPrecondition                        Traceback (most recent call last)

<ipython-input-6-23c8838c83a1> in <cell line: 31>()
     29     raise ValueError("max_model_len cannot exceed 8192")
     30 
---> 31 models["vllm_gpu"], endpoints["vllm_gpu"] = deploy_model_vllm(
     32     model_name=common_util.get_job_name_with_datetime(prefix="llama3_1-vllm-serve"),
     33     model_id=merged_model_output_dir,

5 frames

/usr/local/lib/python3.10/dist-packages/google/api_core/future/polling.py in result(self, timeout, retry, polling)
    259             # pylint: disable=raising-bad-type
    260             # Pylint doesn't recognize that this is valid in this case.
--> 261             raise self._exception
    262 
    263         return self._result

FailedPrecondition: 400 Model server exited unexpectedly. Model server logs can be found at ......

All the links to logs in the output shows an empty log :|

Steps to Reproduce the Problem

Open notebook https:/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_llama3_1_finetuning.ipynb
Go through the steps
The deployment step fails

Specifications

Version:
Platform:

The text was updated successfully, but these errors were encountered:

sharkeshd · 2024-09-24T15:29:12Z

It looks like you're encountering a FailedPrecondition error while trying to deploy the LLaMA model using VLLM. This typically indicates that the model server is having issues starting up or executing properly.
Check Model Logs: As suggested in the error message, look at the model server logs for any specific error messages that might indicate what went wrong.
Model Compatibility: Ensure that the model you're trying to deploy is compatible with the environment and the GPU you're using (NVIDIA L4 in this case). Check if any specific dependencies or configurations are required for LLaMA 3.
Memory and Resource Allocation: Verify that the g2-standard-12 instance has enough memory and resources allocated. Sometimes, insufficient resources can cause the server to fail during initialization.

gericdong assigned lavraicse Sep 4, 2024

kathyyu-google assigned kathyyu-google and unassigned lavraicse Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 fine-tune deployment error #3443

Llama 3.1 fine-tune deployment error #3443

sorenmat commented Aug 20, 2024

sharkeshd commented Sep 24, 2024

Llama 3.1 fine-tune deployment error #3443

Llama 3.1 fine-tune deployment error #3443

Comments

sorenmat commented Aug 20, 2024

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

sharkeshd commented Sep 24, 2024