Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: EmbeddingFunction not working as documented in migration docs #2835

Open
davidtbo opened this issue Sep 22, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@davidtbo
Copy link

What happened?

Followed instructions here exactly:
https://docs.trychroma.com/deployment/migration#migration-to-0.4.16---november-7,-2023

from chromadb.api.types import Documents, Embeddings, Embeddable, Images, Protocol
from transformers import pipeline
from typing import TypeVar, Union

config = {
    "embedding_model": "microsoft/Multilingual-MiniLM-L12-H384",
    # ...other stuff omitted for brevity
}

pipeline = pipeline(
    task="feature-extraction", 
    model=config['embedding_model']
)

Embeddable = Union[Documents, Images]
D = TypeVar("D", bound=Embeddable, contravariant=True)

class EmbeddingFunction(Protocol[D]):
    def __call__(self, input: D) -> Embeddings:
        return pipeline(input, return_tensors=True)

Got error in log output below, instructing me to do the thing I just did. Any help would be greatly appreciated. :)

Versions

chromadb version 0.5.7,
chroma-hnswlib 0.7.6 (this was installed by chroma, not me directly)
python 3.10.12
Ubuntu 22.04

Relevant log output

ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
E           Please see https://docs.trychroma.com/guides/embeddings for details of the EmbeddingFunction interface.
E           Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/deployment/migration#migration-to-0.4.16---november-7,-2023
@davidtbo davidtbo added the bug Something isn't working label Sep 22, 2024
@tazarov
Copy link
Contributor

tazarov commented Sep 24, 2024

@davidtbo, can you try this:

from transformers import pipeline
from typing import Dict, Any
from chromadb.api.types import (
    Documents,
    EmbeddingFunction,
    Embeddings
)


class MyCustomEmbeddingFunction(EmbeddingFunction[Documents]):
    def __init__(
            self,
            **kwargs: Dict[str, Any]
    ):
        """Initialize the embedding function."""
        self._pipeline = pipeline(
            task="feature-extraction", 
            model=kwargs.get('embedding_model')
        )


    def __call__(self, input: Documents) -> Embeddings:
        """Embed the input documents."""
        return self._pipeline(input, return_tensors=True)


if __name__ == "__main__":
    embedding_function = MyCustomEmbeddingFunction(embedding_model="microsoft/Multilingual-MiniLM-L12-H384")
    print(embedding_function(["Hello, world!"]))

The EmbeddingFunction can be directly inherited with the correct type(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants