Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WstxInputProperties. P_ALLOW_XML11_ESCAPED_CHARS_IN_XML10 to allow XML 1.1 escaped characters in XML 1.0 content #56

Merged
merged 2 commits into from
Aug 21, 2018

Conversation

mguessan
Copy link

DavMail EWS implementation relies on the wonderful Woodstox Stax parser, with just a small issue: Exchange sends XML 1.0 content with XML 1.1 escaped characters.

Currently DavMail source code includes a patched version of StreamScanner to disable this check, I would really like to drop this patch.

This pull request includes a new custom property com.ctc.wstx.xml10AllowAllEscapedChars to allow XML 1.1 escaped characters in XML content.

Note that this is also related to bug #28

@cowtowncoder
Copy link
Member

@mguessan Excellent, looks good by quick glance.

Now: right now my time to work on OSS is bit limited, so merging may take a while (I hope to have more time by mid-July, in 2 weeks). But I wanted to add a note that I have seen the PR.
Please ping me again by, say, July 15th if it seems there is no progress; I don't want to forget this.

Thank you for the contribution!

@mguessan
Copy link
Author

Great, the initial patch is in DavMail tree since 2012, see:
https:/mguessan/davmail/blob/master/src/java/com/ctc/wstx/sr/StreamScanner.java

It would be really great to get rid of it !

@cowtowncoder
Copy link
Member

@mguessan Ok: only one request: could you please add @since 5.2 for all public methods, fields? This makes it easier to see when a feature was added. I can merge this with that change.

@mguessan
Copy link
Author

Most methods added are for internal use, and other similar methods don't have since tags
=> added @since 5.2 on the only publicly usable flag: P_XML10_ALLOW_ALL_ESCAPED_CHARS

@eikemeier
Copy link

I like this patch, but doesn't it address the same issue as P_VALIDATE_TEXT_CHARS from #37?

@mguessan
Copy link
Author

Not exactly, sounds like the - not implemented - P_VALIDATE_TEXT_CHARS would disable checks on raw unicode characters.

P_XML10_ALLOW_ALL_ESCAPED_CHARS intends to allow perfectly valid XML 1.1 entities like � in XML 1.0 documents.

Note that this patch was implemented to address the Microsoft generated XML issue.

@eikemeier
Copy link

Note that this patch was implemented to address the Microsoft generated XML issue.

I know, that what we are all fighting with. If you check the sources for the P_VALIDATE_TEXT_CHARS issue you'll notice that exactly this problem is the motivation for this request.

So, it might be poorly formulated but I assume the concepts overlap, see “… only characters affected would be control characters (0x00 - 0x1F minus tab, cr, lf)”.

mguessan added a commit to mguessan/davmail that referenced this pull request Aug 8, 2018
@cowtowncoder cowtowncoder merged commit 753822c into FasterXML:master Aug 21, 2018
@cowtowncoder cowtowncoder added this to the 5.2.0 milestone Aug 21, 2018
@cowtowncoder cowtowncoder changed the title Implement a flag to allow XML 1.1 escaped characters in XML content Add WstxInputProperties.P_XML10_ALLOW_ALL_ESCAPED_CHARS to allow XML 1.1 escaped characters in XML content Aug 21, 2018
@cowtowncoder cowtowncoder changed the title Add WstxInputProperties.P_XML10_ALLOW_ALL_ESCAPED_CHARS to allow XML 1.1 escaped characters in XML content Add WstxInputProperties.P_XML10_ALLOW_ALL_ESCAPED_CHARS to allow XML 1.1 escaped characters in XML 1.0 content Aug 21, 2018
@cowtowncoder
Copy link
Member

Thank you for contributing this @mguessan -- hoping to get 5.2.0 released relatively soon now.

@eikemeier
Copy link

Sorry to nitpick again - shouldn't it be WstxInputProperties.P_XML10_ALLOW_XML11_ESCAPED_CHARS? That's also what the title says, and � will still be rejected.

Also, line 2405 could read !(mXml11 || mXml10AllowAllEscapedChars) &&, making the purpose much clearer, but this might just be my personal preference.

Thanks for the patch, much appreciated.

@cowtowncoder
Copy link
Member

@eikemeier Yes, name is not optimal. Perhaps

P_ALLOW_XML11_ESCAPED_CHARS_IN_XML10

would be most descriptive?

@eikemeier
Copy link

Seems good to me.

@cowtowncoder cowtowncoder changed the title Add WstxInputProperties.P_XML10_ALLOW_ALL_ESCAPED_CHARS to allow XML 1.1 escaped characters in XML 1.0 content Add WstxInputProperties. P_ALLOW_XML11_ESCAPED_CHARS_IN_XML10 to allow XML 1.1 escaped characters in XML 1.0 content Aug 28, 2018
@cowtowncoder
Copy link
Member

So it is now:

WstxInputProperties.P_ALLOW_XML11_ESCAPED_CHARS_IN_XML10

defined as

"com.ctc.wstx.allowXml11EscapedCharsInXml10"

cowtowncoder added a commit that referenced this pull request Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants