Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force UTF8 for all text reading / writing. #1353

Closed
wants to merge 1 commit into from

Conversation

LeifW
Copy link
Contributor

@LeifW LeifW commented Jul 1, 2014

Replaced all usages of readFile, writeFile, hGetLine, and hPutStrLn from
Prelude with versions from
System.IO.UTF8

Fixes #94

Tested with export LANG=C. Was able to load a unicode-containing .idr
file, compile and run it, use interactive vim stuff like
case-splitting and proof search, and could also :addproof from the repl.

Technically, only the change to readFile in Idris/Chaser.hs is necessary
to fix #94. The rest are mainly for consistency. I am a little leary
of changing the output encoding to unicode when the system encoding
might not be that on the things that don't deal with idris code, e.g.
the .c, .java, .pom, etc output files, that are then run through gcc,
javac, whatever.

Also, this might address the issue fixed by #1334, and make #1334
redunant?

Replaced all usages of readFile, writeFile, hGetLine, and hPutStrLn from
Prelude with versions from
[System.IO.UTF8](http://hackage.haskell.org/package/utf8-string-0.3.8/docs/System-IO-UTF8.html)

Fixes idris-lang#94

Tested with `export LANG=C`. Was able to load a unicode-containing .idr
file, compile and run it, use interactive vim stuff like
case-splitting and proof search, and could also :addproof from the repl.

Technically, only the change to readFile in Idris/Chaser.hs is necessary
to fix idris-lang#94.  The rest are mainly for consistency.  I am a little leary
of changing the output encoding to unicode when the system encoding
might not be that on the things that don't deal with idris code, e.g.
the .c, .java, .pom, etc output files, that are then run through gcc,
javac, whatever.

Also, this might address the issue fixed by idris-lang#1334, and make idris-lang#1334
redunant?
@Melvar
Copy link
Collaborator

Melvar commented Jul 1, 2014

As far as I can see, System.IO.UTF8 seems to always assume that strings were read as bytes erroneously, however my system will read them as UTF-8 perfectly correctly, so it would try to double-decode in that case, which would almost certainly cause a lot of failures. To test this, try using a UTF-8 locale and using non-ascii in your file or queries. Calling System.IO.UTF8.getLine in GHCi on my system and feeding it τι throws an exception ("*** Exception: Enum.toEnum{Word8}: tag (964) is outside of bounds (0,255))

If you want to force UTF-8 everywhere, I believe the correct approach is to set the handle encoding with hSetEncoding after opening every handle that will be used for text.

@edwinb
Copy link
Contributor

edwinb commented Jul 6, 2014

What's the status of this? There seems to be a merge conflict at the moment in any case...

@LeifW
Copy link
Contributor Author

LeifW commented Jul 9, 2014

This worked for me in my testing, but the extra encoding / decoding the IO methods from the utf8-string library seems weird. I'll redo this by adding a readUtf8File utility function to replace readFile that ignores LANG and simply sets the file handle to UTF8. #94 (comment)

@david-christiansen
Copy link
Contributor

Hi @LeifW, how's the re-do coming? Or is it done? I just want to make sure we're not waiting on each other here :-)

@david-christiansen
Copy link
Contributor

@LeifW Should I close this PR now? You mention that it should be re-done, but there hasn't been any further activity for some time.

@LeifW
Copy link
Contributor Author

LeifW commented Sep 24, 2014

Closing for now; not sure I'll be able to get to it soon, and haven't seen complaints about this behaviour lately (not having UTF8 in the stdlib source helps). Will re-open when I have something ready for review & merge.

Now that the rainy season's started and the days are shorter, maybe I'll be spending more time inside in front of a computer...

@LeifW LeifW closed this Sep 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build error with Unicode
4 participants