-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to allow broken encoding in attibute values #60
Comments
When constructing https://www.fileformat.info/info/unicode/char/0fffd/index.htm which will then add garbage to attribute value. I don't think this is something Woodstox should really be doing. Although I understand it may be inconvenient, I think handling of broken content is something that application needs to configure somehow. |
I have to consume a message from a message broker with (sometimes) broken encoding in one of its attributes. (Its from a legacy software that nobody wants/dares to touch.)
Currently when trying to parse the mesages I get the following Exception:
If I use the same bytes in a String directly it works perfectly fine.
It would be nice if I could use an option to allow broken encodings in my Strings instead of Exceptions.
(After parsing the input, I usually have enough context to know which messages I have to fix and how)
I use jackson-dataformat-xml 2.9.6 + woodstox 5.0.3/5.1 to parse the message.
Currently I use the following workaround to bypass the issue:
As an alternative I considered using a plain byte solution, but unfortunately the parser still tries to parse the input as String so it can use it with base64 encoding and I did't find a way to tell the parser just give me the bytes without reverse base64 it first.
Code to reproduce
Data class:
Test method:
Output:
The text was updated successfully, but these errors were encountered: