Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build error with Unicode #94

Closed
Warbo opened this issue Nov 22, 2012 · 12 comments
Closed

Build error with Unicode #94

Warbo opened this issue Nov 22, 2012 · 12 comments

Comments

@Warbo
Copy link

Warbo commented Nov 22, 2012

When compiling I got as far as type-checking lib/Prelude/Complex.idr then got the error "hGetContents: invalid argument (invalid byte sequence)".

This happened with "cabal install idris" (version 0.9.5.1) and with a clone of commit cea7205

I changed the copyright header in that file from using a non-ASCII character to "(c)" and this made the error go away, allowing me to compile successfully. I don't know enough about Unicode handling in Haskell/Idris to stop this reoccuring, but I thought I'd raise the issue and my quick hack.

I'm running Debian unstable on an OLPC XO-1 laptop. Here are some possibly relevant numbers:

$ uname -a
Linux olpc 2.6.32-5-486 #1 Fri Dec 10 15:32:53 UTC 2010 i586 GNU/Linux

$ dpkg -l ghc | grep "ii"
ii ghc 7.4.1-4 i386 The Glasgow Haskell Compilation system
ii libghc-ansi-terminal-de 0.5.5-3+b1 i386 Simple ANSI terminal support, with Windows compatibi
ii libghc-ansi-wl-pprint-d 0.6.4-1+b1 i386 Wadler/Leijen Pretty Printer for colored ANSI termin
ii libghc-dlist-dev 0.5-3+b1 i386 Haskell library for Differences lists
ii libghc-hostname-dev 1.0-4+b1 i386 providing a cross-platform means of determining the
ii libghc-mtl-dev 2.1.1-1 i386 Haskell monad transformer library for GHC
ii libghc-quickcheck2-dev 2.4.2-1+b1 i386 Haskell automatic testing library for GHC
ii libghc-random-dev 1.0.1.1-1+b1 i386 Random number generator for Haskell
ii libghc-regex-base-dev 0.93.2-2+b2 i386 GHC library providing an API for regular expressions
ii libghc-regex-posix-dev 0.95.1-2+b1 i386 GHC library of the POSIX regex backend for regex-bas
ii libghc-smallcheck-dev 0.6-1+b1 i386 Another lightweight testing library
ii libghc-syb-dev 0.3.6.1-1 i386 Generic programming library for Haskell
ii libghc-test-framework-d 0.6-1+b1 i386 Framework for running and organising tests
ii libghc-test-framework-q 0.2.12.1-1+b1 i386 QuickCheck2 support for the test-framework package.
ii libghc-text-dev 0.11.2.0-1 i386 efficient packed Unicode text type for Haskell - GHC
ii libghc-transformers-dev 0.3.0.0-1 i386 Haskell monad transformer library
ii libghc-utf8-string-dev 0.3.7-1+b1 i386 GHC libraries for the Haskell UTF-8 library
ii libghc-x11-dev 1.5.0.1-1+b2 i386 Haskell X11 binding for GHC
ii libghc-xml-dev 1.3.12-1+b2 i386 A simple Haskell XML library - GHC libraries
ii libghc-xmonad-dev 0.10-4+b2 i386 Lightweight X11 window manager; libraries

$ ghc -v
Glasgow Haskell Compiler, Version 7.4.1, stage 2 booted by GHC version 7.4.1
Using binary package database: /usr/lib/ghc/package.conf.d/package.cache
Using binary package database: /home/chris/.ghc/i386-linux-7.4.1/package.conf.d/package.cache
hiding package text-0.11.2.0 to avoid conflict with later version text-0.11.2.3
hiding package mtl-2.1.1 to avoid conflict with later version mtl-2.1.2
wired-in package ghc-prim mapped to ghc-prim-0.2.0.0-bd29cb1ca1b712d64e00ac9207f87d0a
wired-in package integer-gmp mapped to integer-gmp-0.4.0.0-ec87c5d9609a1d46da031ef5d51c4f79
wired-in package base mapped to base-4.5.0.0-c8e7184681d410015e93df85fc49e9dd
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.7.0.0-fea440f2bc02cf9a412f25b6b74c4a70
wired-in package dph-seq not found.
wired-in package dph-par not found.
Hsc static flags: -static
*** Deleting temp files:
Deleting:
*** Deleting temp dirs:
Deleting:
ghc: no input files
Usage: For basic information, try the `--help' option.

$ file lib/Prelude/Complex.idr
lib/Prelude/Complex.idr: UTF-8 Unicode text

$ hexdump -C lib/Prelude/Complex.idr | head
00000000 7b 2d 0a 20 20 c2 a9 20 32 30 31 32 20 43 6f 70 |{-. .. 2012 Cop|
00000010 79 72 69 67 68 74 20 4d 65 6b 65 6f 72 20 4d 65 |yright Mekeor Me|
00000020 6c 69 72 65 0a 2d 7d 0a 0a 0a 6d 6f 64 75 6c 65 |lire.-}...module|
00000030 20 50 72 65 6c 75 64 65 2e 43 6f 6d 70 6c 65 78 | Prelude.Complex|
00000040 0a 0a 69 6d 70 6f 72 74 20 42 75 69 6c 74 69 6e |..import Builtin|
00000050 73 0a 69 6d 70 6f 72 74 20 50 72 65 6c 75 64 65 |s.import Prelude|
00000060 0a 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |..--------------|
00000070 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
00000080 20 52 65 63 74 61 6e 67 75 6c 61 72 20 66 6f 72 | Rectangular for|
00000090 6d 20 0a 0a 69 6e 66 69 78 20 36 20 3a 2b 0a 64 |m ..infix 6 :+.d|

@edwinb
Copy link
Contributor

edwinb commented Nov 23, 2012

This is slightly surprising since the version of GHC ought to handle unicode Strings. I suppose what I'll do is change it to a (c) to fix the compilation error, and leave this issue open in case anyone is able to explain it. Thanks for mentioning.

@wrwills
Copy link

wrwills commented Jan 30, 2013

I had a similar issue when building on a new system where I hadn't set done my locale configuration properly.

With
LANG=C
hGetContents was choking on the line
"-- and defining i+i = i and i+s = s = s+i for all s ∈ S."
in Maybe.idr

Running
export LANG=en_GB.UTF-8
and then building again fixed it.

@Warbo
Copy link
Author

Warbo commented Jan 31, 2013

It makes sense that my locale wasn't set properly, as I'd installed Debian via debootstrap, which only does enough configuration to get chroot working. I'll add 'set locale' to my post-install checklist next time ;)

@LeifW
Copy link
Contributor

LeifW commented Jun 1, 2013

I think this specific case can be closed now? But there was more in-depth discussion on the mailing list, for allowing unicode in the Idris sources in the future. Something about just having Idris simply assume the sources are UTF8?

@Warbo
Copy link
Author

Warbo commented Jun 3, 2013

I'm happy for it to close.

@LeifW
Copy link
Contributor

LeifW commented Jun 18, 2014

tjice just reported something that looks rather similiar in IRC: http://codepad.org/hsRtppRm
Builds idris fine, but then idris barfs trying to compile the .idr libs.

@LeifW
Copy link
Contributor

LeifW commented Jun 18, 2014

Doing some digging - I suspect hGetContents is being called from the readFile in Idris/Chaser.hs. This issue might shed some light - finnsson/template-helper#2

@LeifW
Copy link
Contributor

LeifW commented Jun 18, 2014

Perhaps we could set to locale to utf8 on each file handle we open (to force all the .idr files to be read as utf8, rather than using the system locale - "The default encoding when a Handle is created is localeEncoding, namely the default encoding for the current locale." - https://hackage.haskell.org/package/base-4.7.0.0/docs/System-IO.html#g:23
Or another idea - could we set the LANG var or whatever to unicode during the part of the idris build process where it builds the stdlibs - leaving the end user free to write .idr files in non-utf8 on their own?

@david-christiansen
Copy link
Contributor

This sounds horribly complicated. In my opinion, the right thing to do is to just define UTF-8 as the one true encoding for Idris files, and arrange for the Haskell code to always use it.

@LeifW
Copy link
Contributor

LeifW commented Jun 18, 2014

Thinking of adding a readUtf8File to say Util/System.hs, that mimics readFile, only setting encoding of the file handle to utf8. Would also need to replace file writing from !-suffixed repl commands by write equivalent, I imagine.

@david-christiansen
Copy link
Contributor

Sounds reasonable if such a thing isn't already in the libraries.

/David (from phone)
Den 18 jun 2014 17:50 skrev "Leif Warner" [email protected]:

Thinking of adding a readUtf8File to say Util/System.hs, that mimics
readFile, only setting encoding of the file handle to utf8. Would also
need to replace file writing from !-suffixed repl commands by write
equivalent, I imagine.


Reply to this email directly or view it on GitHub
#94 (comment).

@LeifW
Copy link
Contributor

LeifW commented Jun 18, 2014

Oh - we have utf8-string as a dep in .cabal, which already has readFile.

LeifW added a commit to LeifW/Idris-dev that referenced this issue Jul 1, 2014
Replaced all usages of readFile, writeFile, hGetLine, and hPutStrLn from
Prelude with versions from
[System.IO.UTF8](http://hackage.haskell.org/package/utf8-string-0.3.8/docs/System-IO-UTF8.html)

Fixes idris-lang#94

Tested with `export LANG=C`. Was able to load a unicode-containing .idr
file, compile and run it, use interactive vim stuff like
case-splitting and proof search, and could also :addproof from the repl.

Technically, only the change to readFile in Idris/Chaser.hs is necessary
to fix idris-lang#94.  The rest are mainly for consistency.  I am a little leary
of changing the output encoding to unicode when the system encoding
might not be that on the things that don't deal with idris code, e.g.
the .c, .java, .pom, etc output files, that are then run through gcc,
javac, whatever.

Also, this might address the issue fixed by idris-lang#1334, and make idris-lang#1334
redunant?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants