Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use versioned source abbreviations for version numbers #46

Open
jvendetti opened this issue Sep 4, 2024 · 3 comments
Open

Use versioned source abbreviations for version numbers #46

jvendetti opened this issue Sep 4, 2024 · 3 comments

Comments

@jvendetti
Copy link
Member

When we import new versions of the UMLS Metathesaurus, we set the version numbers for the new submissions to match the Metathesaurus insertion version, e.g. 2023AB, 2024AA, etc. See the MedDRA summary page as an example:

https://bioportal.bioontology.org/ontologies/MEDDRA

We are frequently contacted by end users wanting to know which version of the vocabulary is represented by the insertion version. For example: "what version of MedDRA is included in the 2024AA release?". I think a better approach would be to initialize the version numbers of each new submission using the metadata provided by the UMLS Metathesaurus. For example, the vocabulary documentation page for MedDRA shows that the versioned source abbreviation for MedDRA is MDR26_1:

https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MDR/metadata.html

In my opinion, using versioned source abbreviations to indicate version numbers would cut down on the amount of time we need to spend tracking down vocabulary version numbers on behalf of end users. They are generally not aware that you need to navigate away to the UMLS Metathesaurus vocabulary documentation page to find this information:

https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html

@alexskr, @mdorf - can you think of any disadvantages to moving away from using the insertion version numbers for submissions?

@alexskr
Copy link
Member

alexskr commented Sep 4, 2024

A large number of UMLS Vocabulary hosted by Bioportal do not change from one UMLS release to another, so adding a new submission on every UMLS release is less than ideal.

Tracking down what has changed will add some management overhead, and UMLS import scripts will need to be updated to accommodate this new approach; however, I think it's worth it.

@jvendetti
Copy link
Member Author

What about the notion of modifying the UMLS import scripts to look at the last updated date of the vocabularies? Is this possible? The vocabulary documentation page lists a "Last Updated" property for every vocabulary:

https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html

Theoretically it would be nice if we could limit the import to only those vocabularies that have changed in whatever release we're importing. When creating new submissions for the updated vocabularies, we could use the versioned source abbreviation for the submission version number.

@alexskr
Copy link
Member

alexskr commented Sep 4, 2024

Im not sure if umls.nih site provides a way to access vocabularies metadata programmatically but it should be possible to scrape versions from the documentation html pages.

At the bare minimum, we should disable adding new submissions for the UMLS vocabularies that are unlikely to change. Obvious things like AI-RHEUM, SNMI, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants