Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of SVG #1323

Closed
dauwhe opened this issue Aug 4, 2020 · 25 comments · Fixed by #1693
Closed

Validation of SVG #1323

dauwhe opened this issue Aug 4, 2020 · 25 comments · Fixed by #1693
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-ContentDocs The issue affects EPUB content documents

Comments

@dauwhe
Copy link
Contributor

dauwhe commented Aug 4, 2020

See w3c/epubcheck#1172. The WG should probably talk about how to deal with a world with SVG 1.1 and SVG2

@Doktorchen
Copy link

Currently SVG2 is not a recommendation and there is no SVG2 specific version indication, but of course authors can use metada for example with dublin core to indicate which version they have used.

EPUB 3.2 does not indicate, which version to use, therefore it is up to the authors, what to use:
1.0, 1.1 first edition, 1.1 second edition (tiny, basic of full), tiny 1.2.
All of them have a version indication, validation or presentation programs have to respect this.
But if there is no version indication at all, it is up to the program, how to interpret the content.

For a check program there can be different approaches helpful for authors in case of a missing version indication:
Most important: Inform, what version was used for the validation, else the result is useless (applies for HTML5.x as well).
Alternative approaches to chose a version to check:
a) Guess according to the content starting from 2.0 down to 1.0
b) Assume the newest recommendation (relative to the release date of the program)
c) Guess from the date, the document was produced, if available.
d) Indicate missing version information and provide a report for at least 1.1.2; tiny 1.2 and maybe 2.0 (if it becomes a recommendation at all; once alread 1.2 full did not go beyond a draft, only tiny 1.2 with lots of other features became a recommendation).

@iherman
Copy link
Member

iherman commented Aug 11, 2020

Just for the record: w3c/epubcheck#1114 if is also relevant if it comes to discussion in the (C|W)G on real world SVG, although it touches upon a more general issue regarding the usage (or not) of external references

@mattgarrish mattgarrish added Type-SchemaIssue Topic-ContentDocs The issue affects EPUB content documents labels Aug 26, 2020
@murata2makoto
Copy link
Contributor

Let me ask a naive question. Does epubcheck rely on SVG schemas prepared by us? Or, does it now rely on validator.nu?

@mattgarrish
Copy link
Member

Or, does it now rely on validator.nu?

Mostly this.

We're trying not to fundamentally differ from the results that web validation produce, but we do extend the schemas for epub-specific rules.

@bduga
Copy link
Collaborator

bduga commented Oct 2, 2020

I question whether we should be bothering to check for validity of SVG documents. Per Doktorchen's comment it is not trivial, and it is hard to determine if you have even done it correctly. What do we gain from it? It seems like it only tells us that there exists or existed at some point a schema that validated a document. Unlike HTML, SVG doesn't really add much semantic meaning for reading systems from the structure of the file, what is most important is the ability to render the document and the inclusion of any a11y features, neither of which are guaranteed by validity. Additionally, SVGs are rarely edited by hand, so content creators are at the mercy of graphics tools to generate documents that are valid to some schema someone somewhere might use to check the document.

If we do plan to keep validity, we should make it very clear which schema(s) SVG must be valid to.

@mattgarrish
Copy link
Member

I question whether we should be bothering to check for validity of SVG documents.

How do we square this with the restrictions in the specification?

Are we just loosening epubcheck and in theory having restrictions, keeping the epub-specific stuff and dropping validity checks against any specific version of svg, or removing all the content requirements (even that the document be well-formed xml)?

@bduga
Copy link
Collaborator

bduga commented Oct 2, 2020

How do we square this with the restrictions in the specification?

Excellent question! And looking at the spec ... which restrictions? Looks like we lost validity for both XHTML and SVG somewhere between 3.0 and 3.2. Specifically, in 3.0 we said:

It must be an SVG 1.1 document fragment valid to the SVG Content Document schema as defined in SVG Content Document Schema and conform to all content conformance constraints expressed in Restrictions on SVG 1.1.

But in 3.2 we say:

It MUST be an SVG document fragment [SVG], and conform to all content conformance constraints expressed in Restrictions on SVG.

Similar is true for XHTML. So is this simply an error in epubcheck? Should we not be checking for validity anymore?

@mattgarrish
Copy link
Member

mattgarrish commented Oct 2, 2020

Should we not be checking for validity anymore?

It just looks like that reference is dead. We didn't move to drop validity unless you remember a resolution to that effect? There just isn't a schema defined in SVG and we can't point at epubcheck.

I expect it should (if we keep validation) be updated to cite the "Conforming XML-compatible SVG Markup Fragments" conformance class, which would let us drop the XML conformance bullet.

But, regardless, we still have some content requirements that aren't related to general validity. The identifying of XHTML fragments, for example, the restriction on title, etc. Do we check these if we don't check against a default schema?

We probably should make a normative statement for XHTML, too. It looks like we've always relied on this in the relationship section to blanket cover validity to the specification:

The XHTML profile defined by this specification inherits all definitions of semantics, structure and processing behaviors from [HTML] unless otherwise specified.

@murata2makoto
Copy link
Contributor

@bduga

EPUB is intended to be in sync with the reality of the Web. Meanwhile, the longevity of EPUB publications is crucial.

I think that relying on validator.nu (including the choice of schemas for SVG) is probably the best approach. Dropping validity checking from epubcheck may well endanger the longevity of EPUB publications.

@mattgarrish
Copy link
Member

Hunting around a bit, we dropped the validity statement in 3.1 but I can't find any specific mention of it in the minutes or issues.

But it still looks like an omission to have not updated the reference, as the one in embedded SVG was updated to the fragment class in 3.2.

For standalone, I put the wrong class above, though. It should be Conforming SVG Stand Alone-Files

@bduga
Copy link
Collaborator

bduga commented Oct 2, 2020

I think that relying on validator.nu (including the choice of schemas for SVG) is probably the best approach. Dropping validity checking from epubcheck may well endanger the longevity of EPUB publications.

I am not sure how this works with the spec. Specifically, this is not a discussion about epubcheck, though I expect it will have implications there, it is an EPUB question. Given my reading of the spec, there is currently no validity requirement for SVG. We may want to put that requirement back, but I am not sure how to do that and maintain references to living documents. Do you have a proposal for how to draft such a requirement?

I do think that if we can't add a requirement to the spec we shouldn't require validity in epubcheck, although we could allow for optional checks by passing a version to epubcheck (or some other mechanism).

@bduga
Copy link
Collaborator

bduga commented Oct 2, 2020

And, for what it is worth, I like Matt's proposal.

@iherman
Copy link
Member

iherman commented Oct 3, 2020

I believe Matt's proposal would still keep w3c/epubcheck#1114 open. To quote that issue: SVG 1.1, per spec, relies on:

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

Because the svg11.dtd uses XML DTD modularization, it is full of entities. However, as Matt commented, those are forbidden by EPUB. As a consequence, EPUBCheck systematically refuses such SVG files, although they are perfectly valid per SVG Spec.

The problem with this illustrates Brady's comment:

Additionally, SVGs are rarely edited by hand, so content creators are at the mercy of graphics tools to generate documents

As a typical example, Adobe Illustrator systematically puts that DOCTYPE into a generated SVG files (or it did until the latest release, I did not check the last one).

This goes beyond SVG, mind you. MathML has the same problem afaik. I believe we may have to come back to the problem of how to handle XML entities.

B.t.w., the possible solution that came up w3c/epubcheck#1114 is to explicitly put exceptions to the entity rules concerning some standard DTD-s (that XML parsers are not really required to fetch anyway).

@Doktorchen
Copy link

In SVG 1.1.2 (second edition) it is mentioned about the doctype 'It is not recommended that a DOCTYPE declaration be included in SVG documents.'
https://www.w3.org/TR/SVG11/intro.html#NamespaceAndDTDIdentifiers

Therefore a version indication can be expected in the root/top most svg element with a version attribute with values like '1.2', '1.1' or '1.0', if this matters for the author.
Additionally the attribute baseProfile may indicate, which profile is used; because 1.2 has only 'tiny', this will be typically present in such documents or fragments.

Version 2.0 is still in work, it has no own version indication - there is no way to identify it properly for validation. And several modules of '2.0' are currently still only drafts. The current CR for 2.0 is only a subset missing even some major parts present in tiny profiles, therefore it is not ready for pratical use until all those additional modules become recommendations.

Note, that the DTDs do not provide checks for proper complex attribute values, essential for the correct presentation of SVGs. What is additionally available as EBNF seems to contain sometimes bugs in the 2.0 CR, this is not reliable, 1.2 tiny and 1.1.2 may contain more reliable information.

Taking into account that 2.0 as well as HTML5 are matter of change, it seems to be not helpful for authors anymore, to test only or at all 'current' rules (whatever it might mean for a checking program, this will often differ from the point of view of an author), especially because EPUB 3.0 and EPUB 3.2 have the same version indications, but refer to different variants of SVG or HTML5.
It disqualifies for example a checking program, if it indicates errors for already existing valid EPUB 3.0 books due to changes in drafts for SVG or HTML5 or EPUB 3.2.
This will encourage authors, publishers, distributors, shops even more to continue with EPUB2, ignoring EPUB3.x at all, because it is not stable enough to be ready for use.

Even more, it might ease the transition from EPUB2 to EPUB3.x for some people maybe, if it would be possible to continue to use XHTML1.1 (+RDFa) as well in EPUB3.x.
Obviously it has even less relevant semantic elements that HTML5 concerning book content, but anyway, many tools seem not to have the capability to produce rich semantic markup.
Change of such programs will remain slow, can we expect, that many will switch to EPUB3, if this results in checking errors using the same production tools as for EPUB2 content, resulting in a lot of additional work?

Finally, as long as the XML structure is wellformed, every EPUB presentation program should be able at least to provide an accessible presentation of the content with a user-agent-stylesheet as suggested in HTML5 for all variants of (X)HTML. This would be a helpful requirement: To provide an exclusive switch between (alternative) author stylesheets and a simple user-agent-stylesheet.
Obviously it needs much more to present SVG properly than having a wellformed XML structure, but how to check this now or in the future with changing drafts/recommendations? Maybe hopeless.
But SVG has features for alternative text, authors should apply, if the graphics matters for content.
Alternatively the may provide a fig(ure)caption with a sufficient description.
For a wellformed XML document, at least this will be always presentable. Why worry about validity? ;o)

@murata2makoto
Copy link
Contributor

Finally, as long as the XML structure is wellformed, every EPUB presentation program should be able at least to provide an accessible presentation of the content with a user-agent-stylesheet as suggested in HTML5 for all variants of (X)HTML.

Suppose that we have an invalid XML document as part of an EPUB publication. Some existing EPUB readers do not use browser engines. Will all existing EPUB RSs provide reasonably similar results? I have no ideas.

If epubcheck ensures validity, the longevity of EPUB publications is more reliable.

Why worry about validity? ;o

Because the longevity of EPUB publications is extremely important for publishers.

@Doktorchen
Copy link

Longevity would be very important for me as well as an author, would be as well for libraries and archivists.
But I got the impression, that this does not matter anyway anymore for EPUB 3.2, because especially for the core content formats this version does not require anymore a specific version.
Therefore effectively there is no time independent validity anymore, because these formats have incompatible versions. Even more, because in newer versions the version indication is removed, there is no hint anymore, which recommendation authors followed - SVG 1.0, SVG 1.1.1, SVG 1.1,2, SVG tiny 1.2, SVG 2 (which modules or variants?) HTML 5.0, 5.1, 5.2, 5.3, WHATWG-HTML5? And did authors publish an EPUB 3.0 or 3.2 or 3.x (in future)?

If authors or publishers have to indicate used recommendations for example using a related Dublin Core term, surely there is a low chance, that a checking program will recognise it. Only archivists might do this (in theory). If W3C and other organisations continue to propagandise tag soup formats/versions, digital books/texts will be simply reduced to short living disposable products.
Already now digital books unfortunately have the reputation, look and feel of faulty or second choice books (in german called: Mängelexemplare) compared to printed books.

@dauwhe dauwhe added the Agenda+ Issues that should be discussed during the next working group call. label May 4, 2021
@iherman
Copy link
Member

iherman commented May 7, 2021

The issue was discussed in a meeting on 2021-05-07

  • no resolutions were taken
View the transcript

4. Validation of SVG

See github issue #1323.

Dave Cramer: this is the question of how to validate SVG, given that there are so many kinds of SVG
… DTDs in SVGs interact poorly with existing validation tools

Ivan Herman: first, the DTD problem has been settled
… second, currently what we do, like with HTML, is that our references in the spec are to SVG2
… which is what validator.nu validates against
… so it seems like events may have dictated the way that we go

Dave Cramer: perhaps we should postpone this issue to a later call, we are missing some members today

Ivan Herman: +1

Gregorio Pellegrino: +1

@iherman iherman removed the Agenda+ Issues that should be discussed during the next working group call. label May 7, 2021
@dauwhe dauwhe added the Agenda+ Issues that should be discussed during the next working group call. label May 12, 2021
@iherman
Copy link
Member

iherman commented May 14, 2021

The issue was discussed in a meeting on 2021-05-13

  • no resolutions were taken
View the transcript

6. SVG Validation

See github issue #1323.

Dave Cramer: one of the complicating factors is undated references to specs, and now we have SVG1 and SVG2

Brady Duga: would prefer if we didn't validate SVG
… this came up because there was an SVG 1.1 with a rel=no refer, and this failed epubcheck
… i've found a couple other SVG validation errors that came up because of things that weren't valid in 1.1
… and even if we do validate, we can only say that it was valid to an SVG Schema at the time of validation

Matt Garrish: compounding the problem is that validator.nu has partially gone with SVG2, with no plans to fully move to SVG2

Brady Duga: i understand need for validity of XHTML, as XHTML is often edited by hand
… not true for SVG, which are often made by tools
… so you're putting authors at the mercy of their tools, if those tools are generating invalid SVG
… and note that at this point spec doesn't require validation

Dave Cramer: is there a requirement for well-formedness of XML?

Ben Schroeter: what about the XML entities in the DTDs of the SVG?

Dave Cramer: we're not touching anything about rendering or processing, just saying that spec does not require validity, therefore epubcheck does not need to check validity

Brady Duga: if we don't have requirement for validity, then Ivan's issue goes away

Matt Garrish: i we've always had requirement that things conform to XHTML syntax
… but what about when SVGs are embedded in the content?

Dave Cramer: maybe our current action item is to talk to Romain about how this would affect epubcheck
… would also like to hear what Ivan says about this

@iherman
Copy link
Member

iherman commented May 14, 2021

Brady Duga: if we don't have requirement for validity, then Ivan's issue goes away

I presume that referred to w3c/epubcheck#1114. But that issue is already gone in the current version of the spec which explicitly lists SVG (and MatML) DTD-s as acceptable.

@iherman
Copy link
Member

iherman commented May 14, 2021

To help the discussion, here is the summary of what SVG says about conformance.

The SVG2 spec defines SVG Conformance classes:

  1. §2.4.1. Conforming SVG DOM Subtrees: it follows the SVG specification :-)
  2. §2.4.2. Conforming SVG Markup Fragments: this envisages a looser-than-XML fragment for SVG (i.e., like HTML5 vs. XHTML5); plus it is in line with CSS.
  3. §2.4.3. Conforming XML-Compatible SVG Markup Fragments: like (2), but also in proper XML syntax with some minor restrictions (eg, the id attribute is valid for xml:id). Ie, it is also in line with CSS.
  4. §2.4.4. Conforming XML-Compatible SVG DOM Subtrees: a DOM node tree that, once serialized in XML it follows the same restrictions as (3). I guess this is really important when SVG is part of, say, an HTML file.
  5. §2.4.5. Conforming SVG Stand-Alone Files: is a proper xml, has a svg root element, and, otherwise, is along the lines of (3).

It is all a bit convoluted, because (I presume) the SVG2 spec is prepared for SVG content embedded in HTML, including the looser syntax of HTML vs. XHTML.


I guess for our content documents the reference to (5) is the correct one, and that is (almost) what we have in the spec. Almost, because (I think @mattgarrish mentioned that some somewhere) in §3.2.2 SVG Requirement the (correct) links refers to "SVG document Fragment"; it should rather say "Conforming SVG Stand-Alone Files".

@iherman
Copy link
Member

iherman commented May 14, 2021

I believe that, ideally, we should (1) keep what we have and (2) let us rely on navigator.nu for validation, unless the SVG check of navigator.nu is so bad that it does more harm than good...

CC @rdeltour

@bduga
Copy link
Collaborator

bduga commented May 14, 2021

I believe that, ideally, we should (1) keep what we have

I think what we have now (unless @mattgarrish has changed it) is no validity requirement, but that was likely an accidental change.

and (2) let us rely on navigator.nu for validation, unless the SVG check of navigator.nu is so bad that it does more harm than good...

That is precisely the issue. The validation is causing real harm today, forcing ignore list updates to process SVGs. There have been claims of potential harm caused by not requiring validity for SVG, but I have not seen any specific cases (real, identifiable cases of harm that were avoided by validation of SVG). The evidence currently is that validation is causing more harm than good, but I am happy to review evidence to counter that.

@mattgarrish
Copy link
Member

Nope, we're still lacking a validity requirement for svg as far as I can tell. All I fixed was the reference to the standalone definition, but that corresponds to 5. in Ivan's list above. The SVG has to be well-formed XML and the IDs have to be unique (which is also all accessibility requires) but the markup doesn't have to be valid.

We have to separately require conforming SVG DOM subtrees (1. in the list) as far as I can tell. None of the definitions appear to refer to it.

(By comparison, for XHTML we require both that a document "MUST be an [HTML] document that conforms to the XHTML syntax" and that it "conform to the conformance criteria for all document constructs defined by [HTML] unless explicitly overridden in § 3.1.4 HTML Deviations and Constraints".)

The problem with relying on what validator.nu implements is that its implementation is not complete, so what you're asking is that vendors tolerate invalid SVG content or that epubcheck has to fill in missing validity constraints as users stumble on them.

Is either of these options really better than laxer validation that only requires well-formedness and the few other requirements in the definitions?

There are also likely options for validating SVG in epubcheck without needing the specification to be strict about validity. I believe we're using nvdl to validate xhtml, so that should make it possible to validate embedded svg separately from the containing document, but this is where we need @rdeltour's input.

If that is the case, though, perhaps SVG validity problems could be output as info messages rather than as warnings or errors?

@iherman
Copy link
Member

iherman commented May 15, 2021

@mattgarrish:

Nope, we're still lacking a validity requirement for svg as far as I can tell. All I fixed was the reference to the standalone definition, but that corresponds to 5. in Ivan's list above. The SVG has to be well-formed XML and the IDs have to be unique (which is also all accessibility requires) but the markup doesn't have to be valid.

We have to separately require conforming SVG DOM subtrees (1. in the list) as far as I can tell. None of the definitions appear to refer to it.

I was fighting with this yesterday, looking through the SVG spec and (re-reading it again) indeed it looks as if we would have to refer to the DOM conformance separately. Meaning I was wrong: we indeed do not have a validity requirement at this moment, beyond what, essentially, XML validity requires.

(I must admit I was a bit surprised by the way the SVG spec defines all this, the HTML conformance seems to be way clearer.)

@bduga:

The validation is causing real harm today, forcing ignore list updates to process SVGs. There have been claims of potential harm caused by not requiring validity for SVG, but I have not seen any specific cases (real, identifiable cases of harm that were avoided by validation of SVG). The evidence currently is that validation is causing more harm than good, but I am happy to review evidence to counter that.

@mattgarrish:

The problem with relying on what validator.nu implements is that its implementation is not complete, so what you're asking is that vendors tolerate invalid SVG content or that epubcheck has to fill in missing validity constraints as users stumble on them.

Taking also into account that SVG itself is still a bit of a moving target (in CR, without change, since 2 1/2 years...) maybe we can indeed say that we do not require SVG validation for now.

If that is the case, though, perhaps SVG validity problems could be output as info messages rather than as warnings or errors?

+1 to that (if it is possible).

I still wonder whether

  • the content document could say "SVG content should be valid" (not SHOULD, though, because that would have to trigger a different epubcheck behavior)
  • this could be part of a non-normative note giving some background, namely the SVG 2 is still a moving target on the fringes and, therefore, the tooling is still evolving

@dauwhe dauwhe removed the Agenda+ Issues that should be discussed during the next working group call. label May 21, 2021
@iherman
Copy link
Member

iherman commented May 21, 2021

The issue was discussed in a meeting on 2021-05-21

List of resolutions:

View the transcript

1. Validation of SVG

See github issue #1323.

Dave Cramer: Validation of svg
… Perhaps coming to consensus
… maybe keep status quo
… don't require validity now, just well formed, etc
… duga mentions there is real harm to the validity check
… propose informative messages when validition.nu check fails
… no spec change, acknowledge reality of tools

Ivan Herman: Fundamentally agree
… Which category corresponds to the epub check behavior?

Dave Cramer: Weird case since epubcheck is currently out of sync
… with the spec.

Matt Garrish: epubcheck follows the spec, but not entirely bound to it
… epubcheck can do what it wants, only errors and warnings need to match spec
… . Doesn't need a statement in the spec to justify it

Proposed resolution: our spec does not require SVG to be valid only well formed (Ivan Herman)

Brady Duga: +1

Ben Schroeter: +1

Gregorio Pellegrino: +1

Matt Garrish: +1

Dave Cramer: +1

Ivan Herman: +1

Dan Lazin: +1

Tzviya Siegman: +1

Bill Kasdorf: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Resolution #1: our spec does not require SVG to be valid only well formed

Dave Cramer: #1456

rdeltour added a commit to w3c/epubcheck that referenced this issue Jul 8, 2022
EPUB 3.3 no longer requires that SVG content conforms to SVG content model requirements,
only that they are well-formed, that ID are uniques, and that they respect some additional
EPUB-specific requirements.

This commit:
  * introduces a new permissive RelaxNG schema for SVG, checking only the EPUB-specific
    requirements on the `title` and `foreignObject` content model
    see w3c/epub-specs#1323
  * removes checks on the value of the `requiredExtensions` attribute of `foreignObject`
    see w3c/epub-specs#1087
  * adapts the main XHTML to SVG schema driver to the new permissive SVG schema
  * adds various tests for EPUB-specific requirements
rdeltour added a commit to w3c/epubcheck that referenced this issue Aug 18, 2022
EPUB 3.3 no longer requires that SVG content conforms to SVG content model requirements,
only that they are well-formed, that ID are uniques, and that they respect some additional
EPUB-specific requirements.

This commit:
  * introduces a new permissive RelaxNG schema for SVG, checking only the EPUB-specific
    requirements on the `title` and `foreignObject` content model
    see w3c/epub-specs#1323
  * removes checks on the value of the `requiredExtensions` attribute of `foreignObject`
    see w3c/epub-specs#1087
  * adapts the main XHTML to SVG schema driver to the new permissive SVG schema
  * adds various tests for EPUB-specific requirements
@mattgarrish mattgarrish added EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation labels Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-ContentDocs The issue affects EPUB content documents
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants