Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PKG-012 is invalid? #1097

Closed
slonopotamus opened this issue Jan 23, 2020 · 9 comments
Closed

PKG-012 is invalid? #1097

slonopotamus opened this issue Jan 23, 2020 · 9 comments
Assignees
Labels
status: has PR The issue is being processed in a pull request type: false-negative This issue is about invalid content being incorrectly accepted
Milestone

Comments

@slonopotamus
Copy link
Contributor

EPUB specification allows unicode filenames, however epubcheck produces a warning when validates EPUB with non-ASCII symbols in filenames:

WARNING(PKG-012): unicode.epub/OEBPS/test-é.xhtml(-1,-1): File name contains the following non-ascii characters: é. Consider changing the filename.

Is there a reason why PKG-012 warns in this case or PKG-012 should be removed?

@slonopotamus
Copy link
Contributor Author

There's even a TODO in epubcheck that says that PKG-012 should not be a warning.

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
@slonopotamus
Copy link
Contributor Author

I went ahead and submitted #1099.

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 24, 2020
slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 24, 2020
@tofi86 tofi86 added the status: has PR The issue is being processed in a pull request label Jan 24, 2020
@rdeltour
Copy link
Member

This sound like a reasonable request, although I'm not sure if the WARNING is due to widely-adopted practices, and if publishers rely on that.
Any idea @mattgarrish?

@rdeltour rdeltour added status: in discussion The issue is being discussed by the development team type: false-negative This issue is about invalid content being incorrectly accepted and removed status: has PR The issue is being processed in a pull request labels Apr 30, 2020
@slonopotamus
Copy link
Contributor Author

Well, I'm not suggesting to remove it completely but just to downgrade severity as you can see in #1099.

@rdeltour
Copy link
Member

Well, I'm not suggesting to remove it completely but just to downgrade severity

Sure, I understand. But the severity USAGE is much less visible in EPUBCheck, and a lot of ingestion pipelines base their acceptance/rejection criteria at the WARNING level. So we need to be extra careful when demoting a long-time WARNING to a lower severity 😊.

@dauwhe
Copy link
Contributor

dauwhe commented Apr 30, 2020

I tried to unzip one of the EPUB samples with non-ASCII filenames (kusamakura-japanese-vertical-writing.epub) on my Mac, and it errored out:

error:  cannot create /Users/cramerd/Downloads/kusamakura/OPS/xhtml/???.smil
        Illegal byte sequence

This seems to be a known bug on some versions of OSX, but it does illustrate the risk. I wonder if we could at least try to see if something like iTunes Transporter is OK with this?

@mattgarrish
Copy link
Member

We note this problem in the OCF spec (going back to 2.0), but it's left to reading systems to figure out how to patch characters if they can't handle them:

Some commercial ZIP tools do not support the full Unicode range and might support only the [US-ASCII] range for File Names. Authors who want to use ZIP tools that have these restrictions might find it is best to restrict their File Names to the [US-ASCII] range. If the names of files cannot be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.

https://www.w3.org/publishing/epub3/epub-ocf.html#h-note-0

It was never supposed to be a warning, to my knowledge, as that breaks our internationalization efforts.

@murata2makoto do you have any thoughts on this?

@rdeltour
Copy link
Member

It was never supposed to be a warning, to my knowledge, as that breaks our internationalization efforts.

Yeah, I was actually surprised when reviewing this that this issue hasn't been raised before.

I'm leaning towards demoting it to USAGE (to stick closer to the spec). But I'm hesitating to do that "silently" in the forthcoming maintenance release, or if this should go through CG approval first.

@dauwhe
Copy link
Contributor

dauwhe commented Apr 30, 2020

or if this should go through CG approval first.

I don't think we need CG approval. The spec is pretty clear, and warns people about potential issues. USAGE seems correct.

@rdeltour rdeltour added this to the 4.2.3 milestone Apr 30, 2020
@rdeltour rdeltour added status: has PR The issue is being processed in a pull request and removed status: in discussion The issue is being discussed by the development team labels Apr 30, 2020
@rdeltour rdeltour self-assigned this Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: has PR The issue is being processed in a pull request type: false-negative This issue is about invalid content being incorrectly accepted
Projects
None yet
Development

No branches or pull requests

5 participants