-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to use precomputed corpus (df2) for matching single new rows? #18
Comments
Hi @Rinderkm - thanks for raising this! While we currently don't support it, it can easily be done. I invite you or anyone else to contribute for this feature - or I will add it in the next update. This would just involve adding an argument that accepts a path to an embeddings pickle which is loaded after the function call and not embedding the text if they are already loaded via the pickle. That can be done with any function. I answered a similar issue before (on huggingface) here. That might help you to do this without tinkering the package as well. |
Dear econabhishek Thanks for your swift response and the helpful pointer to your answer on huggingface. Kind regards |
Hi all
Thank you for the very helpful package. I am using it to link clinical trials from two databases, one in local language, one in English. Linking is done cross-lingually on the title of the trial, intervention etc.
Is there an option to precompute the embeddings for one of the databases (the "corpus"), so that the embeddings of the corpus database do not need to be recomputed every time one of the linktransformer commands are run and save time? In that way, only the query trial needs to be embedded and then the best match with the existing embeddings from the corpus can be evaluated.
I am thinking about a variant of linktransformer's
evaluate_pairs
method: If I have a new trial, I would load that into df1 and embeddings are calculated, and as df2 the precomputed embeddings of the corpus dataframe would be loaded.Thanks in advance!
The text was updated successfully, but these errors were encountered: