-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue parsing ePub files #14
Comments
childrens-literature has some incorrect formatting in its table of contents that does not conform to the EPUB navigation schema. More specifically, there are several GhV-oeb-page and CF General on the other hand is a different story. These are EPUB 3 files and they use totally different TOC formatting which is essentially just a plain HTML5 file (with a few restrictions). Back in 2015 when this project was first published there were not many EPUB 3 files available so parsing the TOC out of HTML5 content seemed to be not worth it. On top of that, all EPUB 3 files I was able to find for testing were backwards compatible with the EPUB 2 format. However this is not the case with these files. I will add TOC parsing support for EPUB 3 as well but it might take some time. |
Thanks a lot. If I find more EPUB 3 files with issues I will be sure to send them to you for your testing dataset |
Sorry for a delay. Working on this. |
Ok. Let me know when finished and I will run tests against an epub dataset I have. |
I've made a preliminary version with the improved support for EPUB 3 files. It can be found in the This is a preliminary version and it is available as the source code only. There are still some things I plan to finish before making a new release. |
I've released the version 3.0.0 of the library with the better EPUB 3 support. Please reopen this issue if you find any other problems with the EPUB 3 files. |
I have attached 3 ePub files that fail to be parsed by ePubReader.
I found these files in the wild, by google searching by file type to build up a ePub test
dataset to test ePubReader against.
I have other files that fail too but for same reasons as the ones attached (TOC error, etc)
Good job so far.
Thanks.
childrens-literature.zip
GhV-oeb-page.zip
CF General.zip
The text was updated successfully, but these errors were encountered: