-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issue #158
Comments
Can you post a demo file? |
|
This is not an issue of HtmlSanitizer. When you read an ISO-8859-1 encoded file using I'm not sure if AngleSharp does encoding detection if you supply a byte stream to its HTML parser. I'll try and possibly add overloads of |
In 4.0.205 there is now an overload of using (var stream = File.OpenRead("path/to/your/file"))
{
var sanitized = sanitizer.SanitizeDocument(stream);
// ...
} If you're on .NET Core, you need to add the |
Hi,
Is it somehow possible to let HtmlSanitizer detect the encoding of the string that it is sanitizing?
I now load the file with
File.ReadAllText
and then feed it to the .Sanitize method but after that words likekopieën
end up likekopiee?n
In this case there is encoding information in the header of the html file.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
When that encoding is used then everything is oke
The text was updated successfully, but these errors were encountered: