Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add property for exempt resources #2507

Closed
wants to merge 6 commits into from
Closed

Conversation

mattgarrish
Copy link
Member

@mattgarrish mattgarrish commented Dec 16, 2022

With 3.3, we now allow resources in the container, and listed in the manifest, that may or may not be used in rendering the publication. These are defined at the end of the exempt resources section - to allow data sets, etc. to travel with an EPUB.

A problem raised in w3c/epubcheck#1452, however, is that epubcheck has for a long time reported unused resources in the container because it's been an issue with vendors rejecting publications with these.

There's no way of determining whether a resource is unused or was intentionally meant to travel in the container when a reading system or validator cannot find a reference to it in content documents, so if we don't do anything then exempt resources will result in warnings that they appear unused.

This PR represents one option to solve this problem, which is to add a new manifest item properties value for tagging exempt resources. This would clarify that the author intended them to be included. Since no one has yet been able to use this feature, adding this requirement is not problematic from an existing content perspective.

I'm opening this now, but I don't know we'll be able to resolve this until people are back in the new year. It's open for discussion by anyone who desperately needs an epub fix over the next couple of weeks! 😉


Preview | Diff

@mattgarrish
Copy link
Member Author

One additional comment is that I've made this only a recommendation to set so that it matches the warning that epubcheck already emits for unused resources in the container. That way the warning could be to either set the exempt property or remove the resource.

Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about the formal consequences, but I see that the property is set as a SHOULD (in line 4941), ie, it does not invalidate our testing results. That being said, I would hope that, if accepted, epubcheck will add tests on whether it warns or not with this property being set, so this check will appear in our final report. @rdeltour ?

Comment on lines +57 to +59
<p>If [=EPUB creators=] do not set this property, acknowledging the resources are
included by design, reading systems and [=EPUB conformance checkers=] may flag the
resources as unused if they cannot find references to them.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the "acknowledging the resources are included by design" remark here.

Comment on lines +1252 to +1253
<p>Resources that fall under this latter exemption should be identified in the manifest using the
<code>exempt</code> property on their [^item^] entry to avoid their being flagged by EPUB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "fall under this latter exemption" is a bit vague. Isn't it better to make it more explicit and refer to exceptions that are not content-specific?

@rdeltour
Copy link
Member

A couple things:

1. how this relates to EPUBCheck

A problem raised in w3c/epubcheck#1452, however, is that epubcheck has for a long time reported unused resources in the container because it's been an issue with vendors rejecting publications with these.

To clarify: what EPUBCheck warns about is not unused resources, but resources that are present in the container and not listed in the manifest.

This is reported as OPF-003, with the following message:

Item "some-resource-path" exists in the EPUB, but is not declared in the OPF manifest.

EPUBCheck does not (currently) check if the resource is used or not, for that specific report.

So EPUBCheck basically requires authors to list everything in the manifest if they do not want OPF-003 to be reported. That includes unused resources and exempt resources travelling in the container.

If the spec requires that only publication resources must be listed in the manifest, this conflicts with what EPBUCheck has historically recommended for more than a decade (i.e. list everything in the manifest, even unused resources).
#2506 fixes this conflict by removing the requirement that only publication resources are listed.

By removing the requirement that only publication resources are listed in the manifest (as proposed in #2506), unused and exempt resources are in a gray area: EPUB does not recommend that they are listed or not, so they can be. It makes the overzealous-but-historical EPUBCheck OPF-003 report acceptable: authors will keep on listing everything in the manifest so that OPF-003 is not reported.

2. does this PR help?

If I understand correctly, the intent of this PR is to provide an explicit way out of the gray area I mentioned above. Basically, it allows author to say "do not warn about this resource, it has a stamp of approval".

EPUBCheck would:

  • keep on warning about resources in the container but not in the manifest (these resources can be listed in the manifest with an exempt property, to remove the warning)
  • start warning about resources in the manifest but not used, if they do not have the exempt property

That may be a little improvement, but in practice I am concerned about two things:

  • EPUBCheck can only check if a resource is used or not for unscripted content. As soon as scripting is involved, a resource may be used by a script, an image added to the DOM, etc, etc, so EPUBCheck cannot issue any warning.
  • If EPUBCheck starts warning about unused resources that are not explicitly flagged as exempt, then it will potentially start issuing warnings for many existing EPUBs which passed validation before. In other words, this change would introduce a backward-incompatible warning check.

@mattgarrish
Copy link
Member Author

To clarify: what EPUBCheck warns about is not unused resources, but resources that are present in the container and not listed in the manifest.

Yes, but don't you also want to check resources in the manifest to see if they're used in the publication? That's why I opened this issue, as there's no way to differentiate a publication resource you can't find a use of vs. one of these exempt resources.

I know once scripting gets added to the equation things get even murkier, but a data file used by a script is one of the cases that makes these resources "exempt". It's an explicit way of identifying the resources that can't otherwise be determined (it doesn't only have to apply to foreign resources).

then it will potentially start issuing warnings for many existing EPUBs which passed validation before

I'm not sure about this, at least where it matters. Scripting doesn't get significant use, and the legitimate resources that would be flagged would be data files for input, which is probably a niche case given the list of CMTs for which you wouldn't need a fallback.

Where it'll definitely raise warnings is for publications that have resources that are no longer used but are still floating around in the manifest/container. All that requires is removing the unused resource, so we're not forcing publishers to figure out a workaround to keep content working as intended.

But that's one of the questions that needs answering from the group: is this worth getting into? If we choose now not to address what belongs in the manifest, then it will be too late in the future to do anything about it (except list all resource that epubcheck can make no determination about as usage messages).

I don't have a strong opinion about this, so it would be helpful to get some publisher input.

@iherman iherman added the Status-Deferred The issue has been deferred to another revision label Dec 31, 2022
@mattgarrish mattgarrish deleted the fix/add-data-property branch August 9, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status-Deferred The issue has been deferred to another revision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants