-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llava 1.5 invalid output after first inference (llamacpp server) #7060
Comments
We have been noticing this on Koboldcpp 1.64 as well so this may not be specific to the server, on our side it seems to work for the first image but as soon as the image gets swapped it becomes gibberish. |
I'm also experiencing this after commit #6899 but I have done multiple different tests and looked into the features of the outputs, and it seems there's some kind of repeatable ablation going on in the vision encoder that results in psychedelic-like imagery being projected into the language model. Frankly it's fascinating and it would be interesting to see both what is the problem as well as the fix. For now I have no conclusions, unfortunately. |
This broke the server's LLaVA support in a non-obvious way. See ggerganov/llama.cpp#6899 See ggerganov/llama.cpp#7060
@ggerganov take a look, can we revert moondream changes? |
Narrowed down the bug to here https:/vikhyat/llama.cpp/blob/3d771207b7166286baef8f9d90b960418e163f55/examples/llava/clip.cpp#L575 struct ggml_tensor * embeddings = inp;
if (ctx->has_class_embedding) {
embeddings = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, hidden_size, num_positions, batch_size);
// llava 1.5 fix
ggml_set_name(embeddings, "embeddings");
ggml_set_input(embeddings);
//
embeddings = ggml_acc(ctx0, embeddings, model.class_embedding,
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], 0);
embeddings = ggml_acc(ctx0, embeddings, inp,
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], model.class_embedding->nb[1]);
} Changing the location of the |
Same error, after @abetlen fix. Works on older version without moondream support, after revert moondream works well, now again with same error. Take a look @ggerganov @abetlen . |
moondream generate something, but context is shared between different inferences. Generate for first image: |
abetlen here commented that if failed to clear the KV cache before the next inference, and the image embedding is first, this merges context and creates odd results. But also it was said this bug is separate in llama-cpp-python. I don't know if KV cache must be cleared in explicit step when performing inference according to your process above. If maybe we could get some detail about how KV cache is cleared in this instance and replicate this step, we can see if results are still messed up or if problem is indeed fixed? Or perhaps an easier test would be to place image embedding last, such that according to the comment, bug is not triggered. |
@CaptainOfHacks can you install via |
@abetlen now works, can you please add support for Phi3 chat format? And can you create new release to be able to install from pypi with this fix? |
@CaptainOfHacks thanks I'll run the release now, can you link to the phi3 llava chat format in a new issue on llama-cpp-python and I'll take a look? |
llava-phi-3-mini uses the Phi-3-instruct chat template. I think is similar with current llava-1-5, but with Phi3 instruct template instead of llama 2. |
I use this server config:
start server with this command:
All works good for only text mode. But for llava 1.5, works only first run, after this for any image response is invalid.
I execute folllow notebook cells:
For first run works correctly:
Second run with another image dosen't work:
Again with first image:
Here are logs for model loading:
The text was updated successfully, but these errors were encountered: