Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish standard for CL database_cross_reference and contributor annotations #2013

Closed
7 tasks done
ghost opened this issue Jun 8, 2023 · 20 comments · Fixed by #2058
Closed
7 tasks done

Establish standard for CL database_cross_reference and contributor annotations #2013

ghost opened this issue Jun 8, 2023 · 20 comments · Fixed by #2058
Assignees

Comments

@ghost
Copy link

ghost commented Jun 8, 2023

In the process of reviewing #1997, inconsistencies were noted in how annotations are recorded.

Annotation properties to be clarified:

database_cross_reference:

  • Should the value of database_cross_references always be recorded as literal, regardless if it is a CURIE, URI, URL, ORCID or any other type of string?

For example,
the DOI database_cross_reference for the text definition of CL:4023072 'brain vascular cell' where it is literal

Screenshot 2023-06-08 at 11 50 53

vs

the DOI database_cross_reference for the exact synonym "hepatic progenitor cell" in CL:0002196 'hepatic oval stem cell' where it is an IRI

Screenshot 2023-06-08 at 11 50 21
  • Does the decision above apply to BOTH database_cross_reference annotations applied generally to the term and to annotations of annotations (e.g., when a text definition is annotated with a dbxref)?
  • Does the decision above apply to BOTH CURIE and URI formats?
  • When a reference has both a PMID and doi reference, is either one preferred?

contributor:

  • Should contributor ORCIDs be recorded as literals or IRIs?

For all of the above:

Is there a standard for casing?

@ghost ghost self-assigned this Jun 8, 2023
@ghost ghost added the question label Jun 8, 2023
@ghost
Copy link
Author

ghost commented Jun 8, 2023

FYI @matentzn and @dosumis

Also, thank you, @ubyndr, for these SPARQL queries:
xrefs that are recorded as IRIs
Contributor ORCIDs recorded as IRIs

@ghost ghost assigned ubyndr Jun 8, 2023
@matentzn
Copy link
Contributor

matentzn commented Jun 8, 2023

This is a case for the techboard, please add to it;

you need these queries to be totally safe:

  1. orcid format query (@anitacaron has made it, and should be in ODK)
  2. permissible properties query (https:/obophenotype/cell-ontology/blob/master/src/sparql/illegal-annotation-property-violation.sparql), make sure it only lists allowed properties and all the rest are dropped. I made a more robust one I think here: https:/monarch-initiative/mondo/blob/master/src/sparql/qc/general/qc-permitted-properties.sparql but ask Anita for her opinion, I might have not been in my right mind when I wrote it.
  3. permissible prefix queries (see Mondo https:/monarch-initiative/mondo/blob/master/src/sparql/qc/mondo/qc-illegal-prefix-on-xref.sparql and https:/monarch-initiative/mondo/blob/master/src/sparql/qc/mondo/qc-illegal-prefix-on-xref-annotation.sparql to ensure all prefixes are controlled 100%

I would make sure all three are there (ask @anitacaron, @ubyndr or @udp for help) in a PR, then fix all the errors rather than the other way around (fixing first), because this ensures that the tests are working.

@dosumis dosumis changed the title Establish standard for CL annotations Establish standard for CL database_cross_reference annotations Jun 8, 2023
@gouttegd
Copy link
Collaborator

gouttegd commented Jun 8, 2023

Regarding DOIs, my take is that they should be recorded as doi:xx.yyyy/abc, either as a literal string (which is what most people do) or as an IRI (which strictly speaking is more correct).

But they SHOULD NOT be recorded as IRI with values of the form http://doi.org/xx.yyyy/abc, http://dx.doi.org/xx.yyyy/abc, https://doi.org/xx.yyyy/abc, or https://dx.doi.org/xx.yyyy/abc (as in the example given for CL:0002196). This is wrong, because none of these forms are the real identifier (the mere fact that there is more than one form is a clue). The real identifier is doi:xx.yyyy/abc, and nothing else (and yes, doi:... is a valid IRI with regard to the IETF specification).

We SHOULD NOT apply the same logic as for OBO/PURL identifiers, where PREFIX:ZZZZ is merely a compact form of the real identifier which is http://purl.obolibrary.org/obo/PREFIX_ZZZZ. With DOIs, doi:xx.yyyy/abc is the real identifier and all the https?://(dx.?)doi.org/ variants should only be used for resolution, not identification.

@ghost ghost changed the title Establish standard for CL database_cross_reference annotations Establish standard for CL database_cross_reference and contributor annotations Jun 8, 2023
@matentzn
Copy link
Contributor

matentzn commented Jun 8, 2023

For those interested in the extended debate of what @gouttegd is talking about (and with whom I agree for practical purposes, modulo some limitations on the RDF side of things), is information-artifact-ontology/ontology-metadata#59. One of my least favourite threads my work life 🗡️

For now, we should use DOI:xx.yyyy/abc (as per most OBO ontologies) or if you so must doi:xx.yyyy/abc.

@ghost
Copy link
Author

ghost commented Jun 8, 2023

For now, we should use DOI:xx.yyyy/abc (as per most OBO ontologies) or if you so must doi:xx.yyyy/abc.

Thanks for all the feedback.
@gouttegd previously shared this link, which suggests doi:, not DOI:, is preferred.

@gouttegd
Copy link
Collaborator

gouttegd commented Jun 8, 2023

Ah, I knew I had ranted about that before! :D

@matentzn
Copy link
Contributor

matentzn commented Jun 8, 2023

@bvarner-ebi I recommend to not make that choice hastily, and instead grep DOI: through all important ontologies (Uberon, GO, PATO, RO at least), and do what the most widely used, not the "correct" one. doi and DOI are different strings, and before we are externally consistent, OBO ontologies should be internally consistent.

@gouttegd
Copy link
Collaborator

gouttegd commented Jun 8, 2023

For what it’s worth, FBbt is internally consistent on using doi. CL has 35 occurrences of doi: against 63 occurrences of DOI:.

@ghost
Copy link
Author

ghost commented Jun 30, 2023

@anitacaron, as discussed offline, please see the pending questions, with more context above:

  • Should the value of database_cross_references always be recorded as literal, regardless if it is a CURIE, URI, URL, ORCID or any other type of string?

  • Does the decision above apply to BOTH database_cross_reference annotations applied to the whole term AND to annotations of annotations (e.g., when a text definition is annotated with a dbxref)?

  • Does the decision above apply to BOTH CURIE and URI formats?

  • When a reference has both a PMID and doi reference, is either one preferred?

  • Should contributor ORCIDs be recorded as literals or IRIs?

  • Are there circumstances where a whole URI is preferred over CURIE or vice versa? For example, is https://www.orcid.org/ preferred over ORCID: or vice versa?

@matentzn
Copy link
Contributor

The questions are all great and very important. Let's not decide these things on an ontology by ontology basis.

In general, database cross references should be used only for two use cases:

  1. to provide a cross reference of a term to a non-obo term. In this case, use CURIE string.
  2. provide provenance on a definition (IAO:0000115). It should not be used for anything else (use dcterms:source).

PMID vs DOI is a great question that should be asked on the issue tracker but I would tend to "does not matter", and second best "DOI".

All other uses (publications (no standard yet but should request), contributors (dcterms:contributor), URLs with additional information (rdfs:seeAlso) should be faded out).

In case 2, we go against rule 1 for orcids, and use IRI syntax (orcid - always IRI), but only for orcids. DOIs are always curie syntax.

@gouttegd
Copy link
Collaborator

database cross references should be used only for two use cases:

Don't forget the third case: to represent cross-ontology mappings, even when the mapping object is a OBO term.

(Yes, ideally we should all use SSSOM for mappings, but for now almost everyone is still using DB cross-refs.)

For this use case, and at least for Uberon/CL, the value of the DB cross-ref MUST be a curie, because that's what the bridge generation script expects, and we're not ready to switch to anything else.

@matentzn
Copy link
Contributor

Yeah I would like at least to move this to skos / semapv in the mid term.

@gouttegd
Copy link
Collaborator

Do you mean, keep using annotations directly in the ontology (instead of SSSOM mapping sets in externally maintained files), but with dedicated properties from skis or semapv instead of oboInOwl:hasDbXref?

@matentzn
Copy link
Contributor

As an intermediate step yes

@gouttegd
Copy link
Collaborator

Not convinced it is worth it. Such an intermediate step would already require lots of effort to either adapt the existing bridge generation script or come up with a new one -- I'd rather have those efforts directed at supporting the use of SSSOM directly.

@matentzn
Copy link
Contributor

Fine by that. I am 100% on your side when we are talking about xrefs that serve a direct purpose, like generating bridge files; but most ontologies provide mappings merely as an additional piece of metadata, and here, I think teaching people SSSOM may be overkill. But for CL and Uberon (and Mondo etc) you are right!

@ghost
Copy link
Author

ghost commented Jul 4, 2023

Thank you, all, for the feedback.
Based on @matentzn's comments above and internal discussion, here are the direct replies. If this should be added to particular documentation, kindly advise. Otherwise, I will close the issue.

  • Should the value of database_cross_references always be recorded as literal, regardless if it is a CURIE, URI, URL, ORCID or any other type of string?

For database_cross_reference, use CURIE format, enter in Protégé as Value on the “Literal” tab, leave Datatype empty. Non-CURIE values (e.g., URLs) are discouraged, but when used are entered the same way. The exception to this is when ORCIDs are used. ORCIDs should be entered as an IRI in the IRI field on the “IRI Editor” tab.

  • Does the decision above apply to BOTH database_cross_reference annotations applied to the whole term AND to annotations of annotations (e.g., when a text definition is annotated with a dbxref)?

Yes.

  • Does the decision above apply to BOTH CURIE and URI formats?

See above.

  • When a reference has both a PMID and doi reference, is either one preferred?

No, but the bioregistry standard should be used, although this currently is not consistent across ontologies. Since DOIs may be more readily available, they can be consistently used by editors if preferred, written as a CURIE (doi:x) and entered in the “Literal” tab as described above.

  • Should contributor ORCIDs be recorded as literals or IRIs?

IRIs entered on the “IRI Editor” tab. When ORCIDs are used to annotate definitions (e.g., a subject matter expert provides a definition with no readily citable source), the precedent is to add this as a database_cross_reference on the text definition/comment. This will be the standard until otherwise advised. When adding ORCID to identify a term contributor, the annotation property dcterms:contributor is used, and the ORCID is still added on the “IRI Editor” tab,

  • Are there circumstances where a whole URI is preferred over CURIE or vice versa? For example, is https://www.orcid.org/ preferred over ORCID: or vice versa?

For ORCIDs, the whole IRI is preferred. In all other cases, the CURIE is preferred.

@matentzn
Copy link
Contributor

matentzn commented Jul 4, 2023

Excellent summary.

@ghost
Copy link
Author

ghost commented Jul 5, 2023

Thank you for reviewing, @matentzn.
@gouttegd, are these instructions in scope for the CL style guide?

@gouttegd
Copy link
Collaborator

gouttegd commented Jul 5, 2023

@bvarner-ebi I'd say they are in scope, yes. It's clearly something most if not all editors (and reviewers!) should be aware of.

@ghost ghost closed this as completed in #2058 Jul 13, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants