Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property WstxInputProperties.P_VALIDATE_TEXT_CHARS unrecognized #37

Open
git-volver opened this issue Oct 3, 2017 · 7 comments
Open
Labels
pr-welcome Issue for which progress most likely if someone submits a Pull Request

Comments

@git-volver
Copy link

Trying to use the lib (v5.0.3) because it was intend to have a feature to ignore invalid characters in xml text, but in fact it is not there:
The code:
XMLInputFactory2 f = (XMLInputFactory2)XMLInputFactory2.newInstance(); f.setProperty(WstxInputProperties.P_VALIDATE_TEXT_CHARS, Boolean.FALSE);
Produces the following error:
java.lang.IllegalArgumentException: Unrecognized property 'com.ctc.wstx.validateTextChars'
at com.ctc.wstx.api.CommonConfig.reportUnknownProperty(CommonConfig.java:168)
at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:159)
at com.ctc.wstx.api.ReaderConfig.setProperty(ReaderConfig.java:35)
at com.ctc.wstx.sr.BasicStreamReader.setProperty(BasicStreamReader.java:1306)

@cowtowncoder
Copy link
Member

Hmmh. That is certainly unexpected... and does not appear to be tested or used.
I will need to investigate what is the background story here.

Thank you for reporting this.

@rbertucat
Copy link

rbertucat commented Feb 8, 2018

any updates on that please? Being able to ignore invalid characters in xml text is something we definitely need. Thanks

@cowtowncoder
Copy link
Member

No. I haven't had time to work on this so unless someone has time & interest to dig in I don't think this will be worked on any time soon.

For what it is worth, generally time spent on trying to working around detecting invalid XML is better spent on fixing source to produce valid XML.

@cowtowncoder cowtowncoder added the active Issue being actively investigated label Mar 28, 2018
@cowtowncoder
Copy link
Member

Looks like there is no underlying support at all for this, so I can not remember how and why property was added in the first place. I don't usually add property ids for features I don't start working on immediately.

@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 30, 2018

@git-volver @rbertucat one question here: what specific characters would you be hoping to allow? It seems to me that currently only characters affected would be control characters (0x00 - 0x1F minus tab, cr, lf). All the other problems are usually related to character decoding (bad UTF-8), or xml name validity checking.
So although I could implement this feature, it mostly would just allow inclusion of control characters.
Is that what you would be looking for?

Another way to ask this would be: do you have a simple unit test to show the expected handling?

@cowtowncoder cowtowncoder removed the active Issue being actively investigated label Mar 30, 2018
@eikemeier
Copy link

It's basically the inverse problem of this blog post, especially Microsoft Services generate illegal entities like  which is hard to parse. I would expect the function to behave like it would be with an XML 1.1 source: Therefore, XML 1.1 allows the use of character references to the control characters #x1 through #x1F, most of which are forbidden in XML 1.0.

@ltuch
Copy link

ltuch commented Nov 23, 2018

I created a workaround at https:/ltuch/staxtest which demonstrates a method that I use to ignore the invalid XML characters - this uses a FilterInputStream to strip out the bad characters.

Edit: after a bit of thinking I think this workaround has limitations in that it won't work with UTF-16 and possibly other encodings.

Edit 2: I updated the workaround to use a filtered InputStreamReader. Don't think this is the most elegant solution, so would be interested in finding a better solution than my hacks :).

@cowtowncoder cowtowncoder added the pr-welcome Issue for which progress most likely if someone submits a Pull Request label Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-welcome Issue for which progress most likely if someone submits a Pull Request
Projects
None yet
Development

No branches or pull requests

5 participants