Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use precomputed corpus (df2) for matching single new rows? #18

Open
Rinderkm opened this issue May 28, 2024 · 2 comments
Open

Comments

@Rinderkm
Copy link

Hi all

Thank you for the very helpful package. I am using it to link clinical trials from two databases, one in local language, one in English. Linking is done cross-lingually on the title of the trial, intervention etc.

Is there an option to precompute the embeddings for one of the databases (the "corpus"), so that the embeddings of the corpus database do not need to be recomputed every time one of the linktransformer commands are run and save time? In that way, only the query trial needs to be embedded and then the best match with the existing embeddings from the corpus can be evaluated.

I am thinking about a variant of linktransformer's evaluate_pairs method: If I have a new trial, I would load that into df1 and embeddings are calculated, and as df2 the precomputed embeddings of the corpus dataframe would be loaded.

Thanks in advance!

@econabhishek
Copy link
Collaborator

econabhishek commented May 29, 2024

Hi @Rinderkm - thanks for raising this!

While we currently don't support it, it can easily be done. I invite you or anyone else to contribute for this feature - or I will add it in the next update.

This would just involve adding an argument that accepts a path to an embeddings pickle which is loaded after the function call and not embedding the text if they are already loaded via the pickle. That can be done with any function.

I answered a similar issue before (on huggingface) here. That might help you to do this without tinkering the package as well.

@Rinderkm
Copy link
Author

Dear econabhishek

Thanks for your swift response and the helpful pointer to your answer on huggingface.

Kind regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants