-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Property WstxInputProperties.P_VALIDATE_TEXT_CHARS unrecognized #37
Comments
Hmmh. That is certainly unexpected... and does not appear to be tested or used. Thank you for reporting this. |
any updates on that please? Being able to ignore invalid characters in xml text is something we definitely need. Thanks |
No. I haven't had time to work on this so unless someone has time & interest to dig in I don't think this will be worked on any time soon. For what it is worth, generally time spent on trying to working around detecting invalid XML is better spent on fixing source to produce valid XML. |
Looks like there is no underlying support at all for this, so I can not remember how and why property was added in the first place. I don't usually add property ids for features I don't start working on immediately. |
@git-volver @rbertucat one question here: what specific characters would you be hoping to allow? It seems to me that currently only characters affected would be control characters (0x00 - 0x1F minus tab, cr, lf). All the other problems are usually related to character decoding (bad UTF-8), or xml name validity checking. Another way to ask this would be: do you have a simple unit test to show the expected handling? |
It's basically the inverse problem of this blog post, especially Microsoft Services generate illegal entities like |
I created a workaround at https:/ltuch/staxtest which demonstrates a method that I use to ignore the invalid XML characters - this uses a FilterInputStream to strip out the bad characters. Edit: after a bit of thinking I think this workaround has limitations in that it won't work with UTF-16 and possibly other encodings. Edit 2: I updated the workaround to use a filtered InputStreamReader. Don't think this is the most elegant solution, so would be interested in finding a better solution than my hacks :). |
Trying to use the lib (v5.0.3) because it was intend to have a feature to ignore invalid characters in xml text, but in fact it is not there:
The code:
XMLInputFactory2 f = (XMLInputFactory2)XMLInputFactory2.newInstance(); f.setProperty(WstxInputProperties.P_VALIDATE_TEXT_CHARS, Boolean.FALSE);
Produces the following error:
java.lang.IllegalArgumentException: Unrecognized property 'com.ctc.wstx.validateTextChars'
at com.ctc.wstx.api.CommonConfig.reportUnknownProperty(CommonConfig.java:168)
at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:159)
at com.ctc.wstx.api.ReaderConfig.setProperty(ReaderConfig.java:35)
at com.ctc.wstx.sr.BasicStreamReader.setProperty(BasicStreamReader.java:1306)
The text was updated successfully, but these errors were encountered: