Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations missing from noctua.mgi.gpad on snapshot #335

Closed
ukemi opened this issue Jul 20, 2020 · 19 comments
Closed

Annotations missing from noctua.mgi.gpad on snapshot #335

ukemi opened this issue Jul 20, 2020 · 19 comments

Comments

@ukemi
Copy link

ukemi commented Jul 20, 2020

The mgi gpad file on snapshot appears to be missing annotations:
http://snapshot.geneontology.org/products/annotations/noctua_mgi.gpad.gz

This model checks out but I can't find any of the annotations in snapshot from 7/19/2020.
http://noctua.geneontology.org/editor/graph/gomodel:5ee8120100001244?model_id=gomodel:5ee8120100001244

@kltm
Copy link
Member

kltm commented Jul 20, 2020

@ukemi Can you confirm that the correct (latest) version of you model is at: https:/geneontology/noctua-models/blob/master/models/5ee8120100001244.ttl ? According to GH and the model metadata, no changes since 2020-07-06. (I just want to make sure we're at least starting from the issues is in minerva and not in saving, model push, etc.)

@ukemi
Copy link
Author

ukemi commented Jul 21, 2020

Hi @kltm,

Yes, this is the model.

@goodb
Copy link
Contributor

goodb commented Jul 21, 2020

@ukemi when you look at the model in noctua, are the annotations missing - e.g. http://noctua.geneontology.org/workbench/annpreview/?model_id=gomodel:5ee8120100001244
? Indicating either pipeline or minerva gpad generation error. If they are, could you give one example that is not present in the gpad and should be ?

@ukemi
Copy link
Author

ukemi commented Jul 21, 2020

The annotations are there.

@ukemi
Copy link
Author

ukemi commented Jul 23, 2020

Could this also be related to #328? Also now questioning whether we really want to implement #269

@hdrabkin
Copy link

hdrabkin commented Jul 31, 2020

Alka-Selzer moment
I discovered today that between 6/17 and 6/18, we lost over 50% of our Noctua annotations:
6/17 NOCTUA Annotations:
Total Number of Genes Annotated to: 1027
Total Number of Annotations: 6716

6/18 NOCTUA Annotations:
Total Number of Genes Annotated to: 551 <<<<<<<<<< 476 loss
Total Number of Annotations: 3111 <<<<<<<<<< 3605 loss

@kltm
Copy link
Member

kltm commented Aug 5, 2020

TL;DR: So, what we seem to have here is the model-state getting dropped for some reason somewhere in the minerva steps (below); it seems to be there in GH: https:/geneontology/noctua-models/blob/291a0a75bc7a890800da2e13f1953cea6a42aa21/models/5ee8120100001244.ttl#L16 and does not appear in the GPAD.

Any ideas @balhoff or @goodb ?


To spell out how to reproduce this:

From @ukemi 's comment #335 (comment) , we know that these have gotten at least into GH. This would seem to leave to error points: 1) pipeline mechanics (in feeding or handling) or 2) minerva error.

Grabbing the log from the last successful snapshot, it's mentioned six times:

[2020-08-03T07:34:02.861Z] 2020-08-03 00:34:02,780 INFO  (CommandLineInterface:442) Loading models/5ee8120100001244.ttl
[2020-08-03T07:55:43.643Z] 2020-08-03 00:55:43,550 INFO  (BlazegraphMolecularModelManager:594) Load model abox: http://model.geneontology.org/5ee8120100001244 from database
[2020-08-03T08:01:07.825Z] + perl ./util/collate-gpads.pl [A LOT OF STUFF] legacy/gpad/5ee8120100001244.gpad [A LOT OF STUFF]
[2020-08-03T18:58:33.198Z] 2020-08-03 18:58:32,969 INFO org.renci.blazegraph.Load$ - Loading target/noctua-models/models/5ee8120100001244.ttl
[2020-08-03T19:01:36.437Z] http://model.geneontology.org/5ee8120100001244
[2020-08-03T19:02:20.871Z] 2020-08-03 19:02:20,841 INFO org.renci.blazegraph.Reason$ - 1253 changes in Some(http://model.geneontology.org/5ee8120100001244_inferred)

Poking around the stage logs a bit, this seems mechanically what I'd expect.

Trying to simulate locally:

git clone https:/geneontology/noctua-models.git
mkdir models
mv noctua-models/models/5ee8120100001244.ttl ./models/
~/local/src/git/minerva/minerva-cli/bin/minerva-cli.sh --import-owl-models -f models -j blazegraph.jnl
mkdir -p legacy/gpad
~/local/src/git/minerva/minerva-cli/bin/minerva-cli.sh --lego-to-gpad-sparql --ontology http://skyhook.berkeleybop.org/snapshot/ontology/extensions/go-lego.owl -i blazegraph.jnl --gpad-output legacy/gpad
grep -c "5ee8120100001244" legacy/gpad/5ee8120100001244.gpad 

14

Which, I believe, means that our annotation have gotten this far. The final step is:

perl noctua-models/util/collate-gpads.pl legacy/gpad/*.gpad
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI

and there is no further output...which I think is a problem?

In the script the following seems to be triggered:

    if (!grep {$_ eq 'model-state=production'} @props) {

Ah!

grep -c "state" legacy/gpad/5ee8120100001244.gpad 

0

goodb added a commit that referenced this issue Aug 5, 2020
There is a method in the GPAD sparql export that gathers model annotations such as state.  This was getting confused because, at that point in the code, the RDF for the model also contains the RDF for all the Arachne rules.  See comment in CoreMMM .
@goodb
Copy link
Contributor

goodb commented Aug 5, 2020

@kltm I think I have a solution and a cause. Want to run it by @balhoff but I suspect this will do it. I seem to have introduced this in an earlier quest to fix some other problem.

@goodb
Copy link
Contributor

goodb commented Aug 6, 2020

@kltm I think it would be straightforward to add a parameter to the minerva client that would apply the 'production-only' filter at the time the GPAD was generated. Do you want me to do that? Having that perl script that you discovered in the middle of the gpad assembly process for the pipeline seems maybe not so good from the standpoint of testing and stability. LMK.

@hdrabkin
Copy link

Just wondering: we are still missing about 50% in the download. Any progress?

@kltm
Copy link
Member

kltm commented Aug 18, 2020

There is likely an incoming fix with #341 , pending review from @balhoff .

@kltm
Copy link
Member

kltm commented Aug 19, 2020

@hdrabkin @ukemi We should hopefully get some results from the new code on Friday.

@pgaudet
Copy link

pgaudet commented Sep 2, 2020

In the release candidate we are missing several annotations coming from SynGO via Noctua:
for example:

  • human, missing 180 annotations
  • mouse, missing 200 annotation
  • rat, missing 1000 annotations.

I think this is blocking for the Sept 2020 release.

@hdrabkin
Copy link

hdrabkin commented Sep 2, 2020

We Did get ours back last week (we were missing 50%, mix of both SynGO and MGI

@pgaudet
Copy link

pgaudet commented Sep 2, 2020

@kltm could the files we loaded be out of date ?

@hdrabkin
Copy link

hdrabkin commented Sep 2, 2020

The current snapshot file appears to have 6916 lines attributed to SynGO. File header is date is 8/30/2020

@pgaudet
Copy link

pgaudet commented Sep 2, 2020

Thanks @hdrabkin
We did this data in the Sept release (release candidate has 4980 SynGO annotations). I will stop the release process.

@kltm
Copy link
Member

kltm commented Sep 2, 2020

@pgaudet I think that if this is an issue, it would be a new issue, not related to the "production" tag issue we had here. Are you looking at the output GPAD products from like noctua_mgi.gpad.gz?

@kltm
Copy link
Member

kltm commented Sep 2, 2020

Talking to @pgaudet earlier, this may just be an "echo" of this issue as it passes through various external pipelines that are on different schedules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants