Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGROVOC does not pull automatically and does not parse anymore #178

Closed
Tracked by #195
jonquet opened this issue Jan 11, 2022 · 8 comments
Closed
Tracked by #195

AGROVOC does not pull automatically and does not parse anymore #178

jonquet opened this issue Jan 11, 2022 · 8 comments
Assignees
Labels
content Issues related to the content of AgroPortal

Comments

@jonquet
Copy link
Contributor

jonquet commented Jan 11, 2022

Since July 2021 version, AGROVOC does not parse anymore.

Capture d’écran 2022-01-11 à 17 58 51

Plus, we have a pullLocation for AGROVOC :
http://data.agroportal.lirmm.fr/ontologies/AGROVOC/submissions/16?display=pullLocation
But the ontology never get updated automatically (I have to do it manually each month).

January or February release is expected soon to make new tests.

@jonquet jonquet added the content Issues related to the content of AgroPortal label Jan 11, 2022
@syphax-bouazzouni
Copy link
Contributor

AGROV issue diagnostic

State of last submission

See status

image

See logs

AGROV last submission log file can be found at this path : /srv/ontoportal/data/repository/AGROVOC/16/parsing.log

Conclusion from logs

the error is an java.lang.OutOfMemoryError, like what we see in the screenshot above (where we restarted the parsing of the last submission)
image

@syphax-bouazzouni

This comment has been minimized.

@jonquet

This comment has been minimized.

@jonquet
Copy link
Contributor Author

jonquet commented Jan 26, 2022

Relevant post about the issue:
https://stackoverflow.com/questions/52712321/outofmemoryerror-when-joining-a-list-of-strings-in-java

It seems the OWL-API tries to create a string too large.

@jonquet
Copy link
Contributor Author

jonquet commented Jan 28, 2022

Error (OutofMemory) reproduced by @jvendetti when parsing (the nq file) "outside" of AgroPortal stack.
Note: the nt file parse.

@jonquet
Copy link
Contributor Author

jonquet commented Mar 18, 2022

An update :

  • In Protégé the .nt version opens
  • In AgroPortal the .nt version parses and a 877M owlapi.xrdf file is generated. However next parsing step fails because of next error. This situation was reproduced outside of AgroPortal too.
rapper: Serializing with serializer ntriples
rapper: Error -  - XML parser error: Char 0xFFFF out of allowed range
rapper: Error -  - XML parser error: PCDATA invalid Char value 65535
rapper: Failed to parse file /srv/ncbo/repository/AGROVOC/1/owlapi.xrdf rdfxml content
rapper: Parsing returned 8673139 triples

When generating a RDF/XML file with Protégé and re-opening this same file with Protégé the error shows up again, but this time with a line number :
image

Which bring us to the URI : http://aims.fao.org/aos/agrovoc/xDef_8f48da66

image

@jonquet
Copy link
Contributor Author

jonquet commented Mar 18, 2022

Fixing the character allow parsing.
We encounter then another issue described in the log :

image

Probably linked to the recent changes on indexing fields.

@syphax-bouazzouni
Copy link
Contributor

indexing error fixed here ontoportal-lirmm/goo@ba27011

Agrovoc is now parsed, indexed but we had this issue in the diff process #246

So I don't think that the automatic pull will work, to follow up in the future releases here #251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Issues related to the content of AgroPortal
Projects
None yet
Development

No branches or pull requests

2 participants