Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] RAG: Add support for Int8 embeddings #118

Open
svilupp opened this issue Apr 3, 2024 · 6 comments
Open

[FR] RAG: Add support for Int8 embeddings #118

svilupp opened this issue Apr 3, 2024 · 6 comments
Labels

Comments

@svilupp
Copy link
Owner

svilupp commented Apr 3, 2024

It would be great to have support for embeddings compressed to Int8 as per HuggingFace: Embedding Quantization.

Potential implementation would be to:

  • define an embedder (<:AbstractEmbedder for get_embeddings) and the corresponding finder (<:AbstractSimilarityFinder for find_similar)
  • Both would have the vectors with necessary min_values and max_values fields to hold the effective range for each embedding dimension (eg, length(min_values)=length(max_values)=D)
  • define methods for these types
  • The conversion to Int8 could be done post hoc (after build_index) via a utility function and then the resulting finder with the range to allow converting to Int8 (to be provided to the airag)
  • It should implement the two-stage pass with rescore_multiplier=4 (first on Int8 embeddings, then with Float x Int8)
@svilupp svilupp added the RAG label Apr 10, 2024
@pabvald
Copy link
Contributor

pabvald commented Oct 3, 2024

I am going to take care of this one.

I would happily help to move the RAG functionality into a separate package too. Let me know if you want to move forward with that.

The package LinLogQuantization.jl has a pretty neat implementation of linear quantization to unsigned types (UInt8, UInt16, ...). An extension to include signed types would be relative easy but also more work. What do you think about first providing support for unsigned-integer embeddings, and later on extend it to signed integers?

@svilupp
Copy link
Owner Author

svilupp commented Oct 3, 2024

I was hoping to do the RAGTools migration after we merge in the Pinecone support, I don't suppose you would be interested in finishing that?

On the Int8, cool! You can probably re-use a lot from the "bitpacked" embedder. I don't mind if it's signed or not.

On the dep addition, where do you see the benefits to outweigh the costs. It's just a minor performance tweak (no big gains in any direction compared to what we have already), so I'm not sure we would need to support more than one simple implementation of this. Do you have a different view?

@pabvald
Copy link
Contributor

pabvald commented Oct 4, 2024

I would like to implement this first, since I have already invested some time on it. I can take care of the Pinecode support afterwards, it that's stopping the RAG package from being born.

LinLogQuantization.jl implements exactly what we need and nothing else. It's a very small package (less than 300 lines of code) and I cannot see a simpler implementation of linear quantization. In my opinion, anything else than using the package would be wasting effort in reinventing the wheel.

If you really want to avoid the dependency, we could take only the part of the package that implements linear quantization to avoid adding the code for logarithmic quantization.

@pabvald
Copy link
Contributor

pabvald commented Oct 4, 2024

By the way, here's a more detailed explanation of scalar quantization. It's a reference in the article you provided

@svilupp
Copy link
Owner Author

svilupp commented Oct 7, 2024

Sorry for the slow response! I was at a hackathon the whole weekend.

I don't think it would be appropriate to add LLQ (with StatsBase as a dep) as a direct dependency of PromptingTools for everyone.
RAG is used by only a subset of PT users, within that subset only a few users will ever look at quantization, within that picking Int8 is quite a niche (the trade-offs are quite nuanced and it's probably not worth it for most).

In addition, if we'll only ever have Int8 (I don't see any benefit from having more Int versions - there are more low-hanging fruits to get performance), it's just 2-3 functions we need, so it's a very simple problem to solve directly.

If you still insist on using the LLQ package, I'd ask you to add it as an extension (weak dep). Then I'm happy to review the PR.

EDIT: If you're super excited to drive a lot more effort in the quantization space and speed up the in-memory embeddings, we could look into shaping that as a sister package that people could just import and get a bunch of different performance optimizations!

@pabvald
Copy link
Contributor

pabvald commented Oct 10, 2024

Understood. I have extended the package for support linear quantization of signed integers (see PR). I can copy the two necessary functions to implement the Int8 index without adding the package as dep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants