Option to use precomputed corpus (df2) for matching single new rows? #18

Rinderkm · 2024-05-28T21:37:46Z

Hi all

Thank you for the very helpful package. I am using it to link clinical trials from two databases, one in local language, one in English. Linking is done cross-lingually on the title of the trial, intervention etc.

Is there an option to precompute the embeddings for one of the databases (the "corpus"), so that the embeddings of the corpus database do not need to be recomputed every time one of the linktransformer commands are run and save time? In that way, only the query trial needs to be embedded and then the best match with the existing embeddings from the corpus can be evaluated.

I am thinking about a variant of linktransformer's evaluate_pairs method: If I have a new trial, I would load that into df1 and embeddings are calculated, and as df2 the precomputed embeddings of the corpus dataframe would be loaded.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

econabhishek · 2024-05-29T04:21:09Z

Hi @Rinderkm - thanks for raising this!

While we currently don't support it, it can easily be done. I invite you or anyone else to contribute for this feature - or I will add it in the next update.

This would just involve adding an argument that accepts a path to an embeddings pickle which is loaded after the function call and not embedding the text if they are already loaded via the pickle. That can be done with any function.

I answered a similar issue before (on huggingface) here. That might help you to do this without tinkering the package as well.

Rinderkm · 2024-05-29T16:00:14Z

Dear econabhishek

Thanks for your swift response and the helpful pointer to your answer on huggingface.

Kind regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to use precomputed corpus (df2) for matching single new rows? #18

Option to use precomputed corpus (df2) for matching single new rows? #18

Rinderkm commented May 28, 2024

econabhishek commented May 29, 2024 •

edited

Loading

Rinderkm commented May 29, 2024

Option to use precomputed corpus (df2) for matching single new rows? #18

Option to use precomputed corpus (df2) for matching single new rows? #18

Comments

Rinderkm commented May 28, 2024

econabhishek commented May 29, 2024 • edited Loading

Rinderkm commented May 29, 2024

econabhishek commented May 29, 2024 •

edited

Loading