Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama starcoder2:15b inline completions model does not work #896

Open
pedrogutobjj opened this issue Jul 14, 2024 · 32 comments
Open

Ollama starcoder2:15b inline completions model does not work #896

pedrogutobjj opened this issue Jul 14, 2024 · 32 comments
Labels
bug Something isn't working

Comments

@pedrogutobjj
Copy link

Hi everyone, I had previously posted about this "error", could anyone help me? As you can see, when trying to make the inline completion model work nothing happens, only the GPU usage goes to 100% practically and nothing happens in the code line. Follow the inline completion configuration screens and my Jupyter screens.

image

image

image

@pedrogutobjj pedrogutobjj added the bug Something isn't working label Jul 14, 2024
@krassowski
Copy link
Member

  1. Does chat work with the same model?
  2. Does completion work with a different, non-local model?
  3. Does setting completion streaming to "always" help?

My first guess would be that the model just takes so long on your machine. Until you can confirm that it works in the chat but not with the completion, then this would be a reasonable assumption.

@krassowski
Copy link
Member

Also, are you using the latest JupyterLab version?

@pedrogutobjj
Copy link
Author

  1. On chat model i'm using gemma2, and works fine.

image

  1. I tested some huggingface models a few weeks ago and it worked normally.

  2. nothing happens when I select the option, the code suggestions still do not appear.

image

@krassowski
Copy link
Member

From your pictures it looks like you are expecting the completion to appear in a new line (based on the position of your cursor). Currently it works by completing the text you star typing in an existing line, you need type 2 or 3 characters and wait. What exact version of JupyterLab are you using?

@krassowski
Copy link
Member

My question is if chat model with the same exact model works, not with a different model. Conversly, if you try using Gemma for inline completion does it work for you?

@pedrogutobjj
Copy link
Author

pedrogutobjj commented Jul 14, 2024

I actually wait a bit to see if any code suggestions appear, but nothing happens,

my jupyterlab version

image

if i use both models(completion model and inline completion model), like gemma2 , its work`s fine!

image

@pedrogutobjj
Copy link
Author

image

@pedrogutobjj
Copy link
Author

however, starcoder for code suggestions is much better than gemma2.

@pedrogutobjj
Copy link
Author

Another situation, when I wait for some code to "auto complete" it "resets" the code from the beginning, from the import, and '''' python ''' appears, would this be normal? Can't we make this more fluid?

image

@richlysakowski
Copy link

richlysakowski commented Jul 14, 2024 via email

@krassowski
Copy link
Member

If other models work but starcoder does not work, it is likely a problem with the model, or in this case also possibly your GPU having less memory than required to run it sufficiently fast to be useful (but I see you managed to run deepseek-coder v2:16b, so unless deepseek-coder is quantized and stardocder is not it would not align with the theory of hardware issue).

@richlysakowski on the main page of the repository (https:/jupyterlab/jupyter-ai) you will see "Unwatch" button, with options to only watch releases.

@krassowski krassowski changed the title Inline completions model doesn t work with me. Ollama starcoder2:15b inline completions model does not work Jul 15, 2024
@pedrogutobjj
Copy link
Author

@krassowski

When I put a part of the code, and the model returns several explanations, and the complete code, with explanations and etc... and I wanted it to complete only the missing part. For example, I insert "import pandas as", I expect the model of complete with "pd", but it repeats the import pandas as pd and puts some random explanations, you know? I just want you to complete what I'm writing, don't rewrite everything. I don't know if my doubt was very clear and if this can be configurable.

@krassowski
Copy link
Member

If you use streaming mode, this should have been fixed in #879 (which will be included in 2.19 release)

@pedrogutobjj
Copy link
Author

Thanks very much!

This release is launch today?

@pedrogutobjj
Copy link
Author

I'm using the new release, 2.19.... same problem..with completions.. appears ````python` ``` , has the "bug" not been fixed?

image

@pedrogutobjj
Copy link
Author

image

using codegemma...

Is there any ollama model that has been tested and passed the tests to complete the lines?

@krassowski
Copy link
Member

Thanks for testing, the Ollama provider is experimental so there may be issues to iron out. Things I would suspect:

  • a) the models may be not very good at respecting instructions and not generating expected output (especially given that some of the models you listed above are rather small, 9b or 15b), or
  • b) there is some issue with new-line endings Windows-style vs Unix-style

Is there any ollama model that has been tested and passed the tests to complete the lines?

You already know the answer, as it was provided in #646 (comment). Otherwise, there are no systematic tests for individual models.

@krassowski
Copy link
Member

I can reproduce the prefix trimming issue with all providers in 2.19.0, whether streaming or not.

@krassowski
Copy link
Member

For some reason in 2.19.0 the suggestion includes an extra space. This is logging from GPT-4 without streaming (so logic which should not have changed since 2.18):

image

@krassowski
Copy link
Member

Ah no, this was still ollama with phi, not GPT-4. So it looks like ollama output parsing may be off by a spurious whitespace at the beginning.

@pedrogutobjj
Copy link
Author

Thanks for testing, the Ollama provider is experimental so there may be issues to iron out. Things I would suspect:

  • a) the models may be not very good at respecting instructions and not generating expected output (especially given that some of the models you listed above are rather small, 9b or 15b), or
  • b) there is some issue with new-line endings Windows-style vs Unix-style

Is there any ollama model that has been tested and passed the tests to complete the lines?

You already know the answer, as it was provided in #646 (comment). Otherwise, there are no systematic tests for individual models.

I liked some results that the gemma2:9b model gave me in response, we could go more in-depth about this model, I test some models daily, the most famous ones and analyze their responses, as I test the models I will report here or elsewhere specific topic.

@krassowski
Copy link
Member

@pedrogutobjj did you have a chance to test the latest release, v2.19.1, which includes #900? Is it any better?

@pedrogutobjj
Copy link
Author

Hey @krassowski , morning!

i tested some lines of codes this morning, here are the results.

image

@krassowski
Copy link
Member

Is this what you would expect or not? To me it looks like syntactically valid response. Of course a bit useless, but this is down to the ability of the model you use.

@pedrogutobjj
Copy link
Author

The logic is correct, I mean the overlaps, I don't know if it was clear about this, for example: I inserted "def sum_matrizes(matrix1, matrix2): ...... then comes the autocomplete part, it repeats the def sum_matrizes again as a suggestion, since I already inserted it at the beginning of the code, I don't know if I was able to be clear in my suggestion.

@krassowski
Copy link
Member

I see that, but it looks like the model is at fault here. It first inserted "import numpy as" and only then started the "def sum_matrizes(matrix1, matrix2):" part again.

Previously you were testing with deepseek-coder and codegemma but now you posted a result from llama3.1. If we want to see if the changes helped with the issue that you reported back then, can you test with the same models?

@pedrogutobjj
Copy link
Author

with deepseek-coder
image

with codegemma:7b

image

@krassowski
Copy link
Member

Thanks! Just to help me reproduce, where was your cursor when you invoked the online completer?

@pedrogutobjj
Copy link
Author

on top of the cell.

@krassowski
Copy link
Member

Do you mean that your cursor was before in the first line here:

def soma_matrices(matriz1, matriz2):|

or in the new line:

def soma_matrices(matriz1, matriz2):
|

or in the new line after tab:

def soma_matrices(matriz1, matriz2):
    |

@pedrogutobjj
Copy link
Author

image

@pedrogutobjj
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants