Force UTF8 for all text reading / writing. #1353

LeifW · 2014-07-01T08:27:32Z

Replaced all usages of readFile, writeFile, hGetLine, and hPutStrLn from
Prelude with versions from
System.IO.UTF8

Fixes #94

Tested with export LANG=C. Was able to load a unicode-containing .idr
file, compile and run it, use interactive vim stuff like
case-splitting and proof search, and could also :addproof from the repl.

Technically, only the change to readFile in Idris/Chaser.hs is necessary
to fix #94. The rest are mainly for consistency. I am a little leary
of changing the output encoding to unicode when the system encoding
might not be that on the things that don't deal with idris code, e.g.
the .c, .java, .pom, etc output files, that are then run through gcc,
javac, whatever.

Also, this might address the issue fixed by #1334, and make #1334
redunant?

Replaced all usages of readFile, writeFile, hGetLine, and hPutStrLn from Prelude with versions from [System.IO.UTF8](http://hackage.haskell.org/package/utf8-string-0.3.8/docs/System-IO-UTF8.html) Fixes idris-lang#94 Tested with `export LANG=C`. Was able to load a unicode-containing .idr file, compile and run it, use interactive vim stuff like case-splitting and proof search, and could also :addproof from the repl. Technically, only the change to readFile in Idris/Chaser.hs is necessary to fix idris-lang#94. The rest are mainly for consistency. I am a little leary of changing the output encoding to unicode when the system encoding might not be that on the things that don't deal with idris code, e.g. the .c, .java, .pom, etc output files, that are then run through gcc, javac, whatever. Also, this might address the issue fixed by idris-lang#1334, and make idris-lang#1334 redunant?

Melvar · 2014-07-01T09:24:48Z

As far as I can see, System.IO.UTF8 seems to always assume that strings were read as bytes erroneously, however my system will read them as UTF-8 perfectly correctly, so it would try to double-decode in that case, which would almost certainly cause a lot of failures. To test this, try using a UTF-8 locale and using non-ascii in your file or queries. Calling System.IO.UTF8.getLine in GHCi on my system and feeding it τι throws an exception ("*** Exception: Enum.toEnum{Word8}: tag (964) is outside of bounds (0,255))

If you want to force UTF-8 everywhere, I believe the correct approach is to set the handle encoding with hSetEncoding after opening every handle that will be used for text.

edwinb · 2014-07-06T15:05:51Z

What's the status of this? There seems to be a merge conflict at the moment in any case...

LeifW · 2014-07-09T05:15:37Z

This worked for me in my testing, but the extra encoding / decoding the IO methods from the utf8-string library seems weird. I'll redo this by adding a readUtf8File utility function to replace readFile that ignores LANG and simply sets the file handle to UTF8. #94 (comment)

david-christiansen · 2014-07-23T12:01:29Z

Hi @LeifW, how's the re-do coming? Or is it done? I just want to make sure we're not waiting on each other here :-)

david-christiansen · 2014-09-24T09:27:20Z

@LeifW Should I close this PR now? You mention that it should be re-done, but there hasn't been any further activity for some time.

LeifW · 2014-09-24T18:03:25Z

Closing for now; not sure I'll be able to get to it soon, and haven't seen complaints about this behaviour lately (not having UTF8 in the stdlib source helps). Will re-open when I have something ready for review & merge.

Now that the rainy season's started and the days are shorter, maybe I'll be spending more time inside in front of a computer...

LeifW closed this Sep 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force UTF8 for all text reading / writing. #1353

Force UTF8 for all text reading / writing. #1353

LeifW commented Jul 1, 2014

Melvar commented Jul 1, 2014

edwinb commented Jul 6, 2014

LeifW commented Jul 9, 2014

david-christiansen commented Jul 23, 2014

david-christiansen commented Sep 24, 2014

LeifW commented Sep 24, 2014

Force UTF8 for all text reading / writing. #1353

Force UTF8 for all text reading / writing. #1353

Conversation

LeifW commented Jul 1, 2014

Melvar commented Jul 1, 2014

edwinb commented Jul 6, 2014

LeifW commented Jul 9, 2014

david-christiansen commented Jul 23, 2014

david-christiansen commented Sep 24, 2014

LeifW commented Sep 24, 2014