PKG-012 is invalid? #1097

slonopotamus · 2020-01-23T17:24:15Z

EPUB specification allows unicode filenames, however epubcheck produces a warning when validates EPUB with non-ASCII symbols in filenames:

WARNING(PKG-012): unicode.epub/OEBPS/test-é.xhtml(-1,-1): File name contains the following non-ascii characters: é. Consider changing the filename.

Is there a reason why PKG-012 warns in this case or PKG-012 should be removed?

The text was updated successfully, but these errors were encountered:

slonopotamus · 2020-01-23T19:56:00Z

There's even a TODO in epubcheck that says that PKG-012 should not be a warning.

…ARNING to USAGE

slonopotamus · 2020-01-23T20:12:47Z

I went ahead and submitted #1099.

…ARNING to USAGE

rdeltour · 2020-04-30T14:22:14Z

This sound like a reasonable request, although I'm not sure if the WARNING is due to widely-adopted practices, and if publishers rely on that.
Any idea @mattgarrish?

slonopotamus · 2020-04-30T14:29:47Z

Well, I'm not suggesting to remove it completely but just to downgrade severity as you can see in #1099.

rdeltour · 2020-04-30T14:34:39Z

Well, I'm not suggesting to remove it completely but just to downgrade severity

Sure, I understand. But the severity USAGE is much less visible in EPUBCheck, and a lot of ingestion pipelines base their acceptance/rejection criteria at the WARNING level. So we need to be extra careful when demoting a long-time WARNING to a lower severity 😊.

dauwhe · 2020-04-30T14:40:34Z

I tried to unzip one of the EPUB samples with non-ASCII filenames (kusamakura-japanese-vertical-writing.epub) on my Mac, and it errored out:

error:  cannot create /Users/cramerd/Downloads/kusamakura/OPS/xhtml/???.smil
        Illegal byte sequence

This seems to be a known bug on some versions of OSX, but it does illustrate the risk. I wonder if we could at least try to see if something like iTunes Transporter is OK with this?

mattgarrish · 2020-04-30T15:30:56Z

We note this problem in the OCF spec (going back to 2.0), but it's left to reading systems to figure out how to patch characters if they can't handle them:

Some commercial ZIP tools do not support the full Unicode range and might support only the [US-ASCII] range for File Names. Authors who want to use ZIP tools that have these restrictions might find it is best to restrict their File Names to the [US-ASCII] range. If the names of files cannot be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.

https://www.w3.org/publishing/epub3/epub-ocf.html#h-note-0

It was never supposed to be a warning, to my knowledge, as that breaks our internationalization efforts.

@murata2makoto do you have any thoughts on this?

rdeltour · 2020-04-30T15:43:01Z

It was never supposed to be a warning, to my knowledge, as that breaks our internationalization efforts.

Yeah, I was actually surprised when reviewing this that this issue hasn't been raised before.

I'm leaning towards demoting it to USAGE (to stick closer to the spec). But I'm hesitating to do that "silently" in the forthcoming maintenance release, or if this should go through CG approval first.

dauwhe · 2020-04-30T15:53:42Z

or if this should go through CG approval first.

I don't think we need CG approval. The spec is pretty clear, and warns people about potential issues. USAGE seems correct.

This was referenced Jan 23, 2020

Encoding issue in the name of the xhtml subpart when no id provided asciidoctor/asciidoctor-epub3#217

Closed

epubcheck raises warnings about preface files containing non-ascii chars asciidoctor/asciidoctor-epub3#162

Closed

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

60ae950

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

bc110bc

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

d86f03f

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

a47d29c

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

1ee2f9c

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 23, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

81a35dd

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 24, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

12ffbc1

…ARNING to USAGE

slonopotamus added a commit to slonopotamus/epubcheck-1 that referenced this issue Jan 24, 2020

[fix w3c#1097] downgrade PKG-012 (filename contains non-ASCII) from W…

4fa7d31

…ARNING to USAGE

tofi86 added the status: has PR The issue is being processed in a pull request label Jan 24, 2020

rdeltour mentioned this issue Apr 30, 2020

[fix #1097] downgrade PKG-012 (filename contains non-ASCII) from WARNING to USAGE #1099

Closed

rdeltour added status: in discussion The issue is being discussed by the development team type: false-negative This issue is about invalid content being incorrectly accepted and removed status: has PR The issue is being processed in a pull request labels Apr 30, 2020

rdeltour added this to the 4.2.3 milestone Apr 30, 2020

rdeltour added status: has PR The issue is being processed in a pull request and removed status: in discussion The issue is being discussed by the development team labels Apr 30, 2020

rdeltour self-assigned this Apr 30, 2020

rdeltour closed this as completed in f368ee5 Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PKG-012 is invalid? #1097

PKG-012 is invalid? #1097

slonopotamus commented Jan 23, 2020

slonopotamus commented Jan 23, 2020

slonopotamus commented Jan 23, 2020

rdeltour commented Apr 30, 2020

slonopotamus commented Apr 30, 2020

rdeltour commented Apr 30, 2020

dauwhe commented Apr 30, 2020

mattgarrish commented Apr 30, 2020

rdeltour commented Apr 30, 2020

dauwhe commented Apr 30, 2020

PKG-012 is invalid? #1097

PKG-012 is invalid? #1097

Comments

slonopotamus commented Jan 23, 2020

slonopotamus commented Jan 23, 2020

slonopotamus commented Jan 23, 2020

rdeltour commented Apr 30, 2020

slonopotamus commented Apr 30, 2020

rdeltour commented Apr 30, 2020

dauwhe commented Apr 30, 2020

mattgarrish commented Apr 30, 2020

rdeltour commented Apr 30, 2020

dauwhe commented Apr 30, 2020