Hard-code UTF-8 encoding for the input file #56
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first draft to fix the encoding errors described in #55 - i.e. we're on Windows and the input file is encoded in UTF-8.
The role of PYTHONIOENCODING, as described in the issue
The bug report mentions that even when the PYTHONIOENCODING env var is set, the preferred encoding determined by Python on Windows is still a Windows-specific encoding.
However, it seems that this is actually pretty much the expected behaviour, as the behaviour of this env var seems to only affect stdin/stdout/stderr?
Potential breaking changes brought by this PR
Of course, assuming that the input file is always encoded in UTF-8, like we do with this PR, could break some existing usages of Rich-CLI. Especially on Windows, where UTF-8 is still not the default encoding if I'm not wrong?
Not sure what would be the safest way to handle that issue? 🤔
Potential ways to handler that better
Maybe we could try to use the file using the system's default encoding first, and then, only if that failed, fall back to UTF-8?
e.g. something like this: (pseudo-code)
As pointed out by @darrenburns , there is now the possibility to use a PYTHONUTF8 env var, which seems to work:
Add a flag and/or an env var specific to Rich-CLI to let the user tell Rich-CLI which encoding we should use to open the input file?
Maybe that would be the most flexible option, combined to a "before raising an exception, fall back to UTF-8 if the default encoding didn't work" strategy? What do you think @willmcgugan @darrenburns ? 🙂
Before / After
Before this fix:
After this fix:
fixes #55