Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 fine-tune deployment error #3443

Open
sorenmat opened this issue Aug 20, 2024 · 1 comment
Open

Llama 3.1 fine-tune deployment error #3443

sorenmat opened this issue Aug 20, 2024 · 1 comment
Assignees

Comments

@sorenmat
Copy link

Expected Behavior

To be able to deploy the example notebook without modifications

Actual Behavior

Deploying llama3-1-vllm-serve-20240820-120605 on g2-standard-12 with 1 NVIDIA_L4 GPU(s).

---------------------------------------------------------------------------

FailedPrecondition                        Traceback (most recent call last)

<ipython-input-6-23c8838c83a1> in <cell line: 31>()
     29     raise ValueError("max_model_len cannot exceed 8192")
     30 
---> 31 models["vllm_gpu"], endpoints["vllm_gpu"] = deploy_model_vllm(
     32     model_name=common_util.get_job_name_with_datetime(prefix="llama3_1-vllm-serve"),
     33     model_id=merged_model_output_dir,

5 frames

/usr/local/lib/python3.10/dist-packages/google/api_core/future/polling.py in result(self, timeout, retry, polling)
    259             # pylint: disable=raising-bad-type
    260             # Pylint doesn't recognize that this is valid in this case.
--> 261             raise self._exception
    262 
    263         return self._result

FailedPrecondition: 400 Model server exited unexpectedly. Model server logs can be found at ......

All the links to logs in the output shows an empty log :|

Steps to Reproduce the Problem

  1. Open notebook https:/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_llama3_1_finetuning.ipynb
  2. Go through the steps
  3. The deployment step fails

Specifications

  • Version:
  • Platform:
@sharkeshd
Copy link
Contributor

It looks like you're encountering a FailedPrecondition error while trying to deploy the LLaMA model using VLLM. This typically indicates that the model server is having issues starting up or executing properly.
Check Model Logs: As suggested in the error message, look at the model server logs for any specific error messages that might indicate what went wrong.
Model Compatibility: Ensure that the model you're trying to deploy is compatible with the environment and the GPU you're using (NVIDIA L4 in this case). Check if any specific dependencies or configurations are required for LLaMA 3.
Memory and Resource Allocation: Verify that the g2-standard-12 instance has enough memory and resources allocated. Sometimes, insufficient resources can cause the server to fail during initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants