Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readFile implementation in Prelude.File partially broken with binary files #4352

Open
ghost opened this issue Feb 25, 2018 · 9 comments
Open

Comments

@ghost
Copy link

ghost commented Feb 25, 2018

The current implementation uses addToStringBuffer to concatenate a read file chunk to a buffer.
Explicit length parameters seem to be used almost everywhere(?), however not for addToStringBuffer,
which uses C strlen in the underlying implementation to deduce the length of the concatenee.
strlen uses C style strings, resulting in truncation of the read chunk at null characters.

Proposal: require an explicit length parameter for idris_addToString

addToStringBuffer contents l

addToStringBuffer (MkString ptr) str =
foreign FFI_C "idris_addToString" (Ptr -> String -> IO ())
ptr str

| Just (FFun "idris_addToString" [(_, strBuf), (_, str)] _) <- foreignFromTT arity ty fn xs
= case (strBuf, str) of
(EStringBuf ref, EConstant (Str add)) ->
do execIO $ modifyIORef ref (++add)
execApp env ctxt ioUnit (drop arity xs)
_ -> execFail . Msg $
"The arguments to idris_addToString should be a StringBuffer and a String, but were " ++
show strBuf ++ " and " ++ show str ++
". Are all cases covered?"

void idris_addToString(void* buffer, char* str);

void idris_addToString(void* buffer, char* str) {
StrBuffer* sb = (StrBuffer*)buffer;
int len = strlen(str);
memcpy(sb->string + sb->len, str, len+1);
sb->len += len;
}

A more grand proposal?: Should Idris globally switch to netstring style strings instead of C strings? I don't know how pervasive things like this are.

@ghost
Copy link
Author

ghost commented Feb 25, 2018

I think fGetChars should be modified to return a tuple of (String, Int) where the Int is the return value of fread() being the number of read primitives according to C docs. That way a length parameter can be obtained that can be passed to addToStringBuffer.

fGetChars : (h : File) -> (len : Int) -> IO (Either FileError String)
fGetChars (FHandle h) len = do str <- do_freadChars h len

private
do_freadChars : Ptr -> Int -> IO' l String
do_freadChars h len = prim_freadChars len h

prim_freadChars : Int -> Ptr -> IO' l String
prim_freadChars len h
= MkIO (\w => prim_io_pure (prim__readChars (world w) len h))

%extern prim__readChars : prim__WorldType -> Int -> Ptr -> String

The following two belong together:

prc = sUN "prim__readChars"

getOp fn (_ : EConstant (I len) : EHandle h : xs)
| fn == prc =
Just (do contents <- execIO $ hGetChars h len
return (EConstant (Str contents)), xs)

| rf == sUN "prim__readChars"
= v ++ "idris_readChars(vm, GETINT(" ++ creg len ++
"), GETPTR(" ++ creg x ++ "))"

VAL idris_readChars(VM* vm, int num, FILE* h);

Idris-dev/rts/idris_rts.c

Lines 645 to 659 in 14f10fd

VAL idris_readChars(VM* vm, int num, FILE* h) {
VAL ret;
char *buffer = malloc((num+1)*sizeof(char));
size_t len;
len = fread(buffer, sizeof(char), (size_t)num, h);
buffer[len] = '\0';
if (len <= 0) {
ret = MKSTR(vm, "");
} else {
ret = MKSTR(vm, buffer);
}
free(buffer);
return ret;
}

@ghost
Copy link
Author

ghost commented Feb 25, 2018

All of a sudden I'm having trouble replicating my original issue on an unmodified branch, so I guess I have to come up with an actual test case.

@ghost
Copy link
Author

ghost commented Feb 25, 2018

Test case for compiled:

$ cat test.idr
main : IO ()
main =
  do
    (Right l) <- readFile "main"
    (putStr . show) l

$ xxd main | head -n 1
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............

$ idris test.idr -o test && ./test
"\DELELF\STX\SOH\SOH"

Notice the truncation at the first null character.
It seems to be fine in the REPL (although I thought I originally ran across the issue there):

Idris> :x the (IO ()) $ (readFile "main") >>= (\(Right x) => (putStr . show) (substr 0 20 x))
"\DELELF\STX\SOH\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\STX\NUL>\NUL"
MkIO (\w => prim_io_pure ()) : IO' (MkFFI C_Types String String) ()

@ghost
Copy link
Author

ghost commented Feb 26, 2018

It has been brought to my attention that MKSTR() also uses strlen and strcpy; with fGetChars using MKSTR() in it's underlying implementation.

@Melvar
Copy link
Collaborator

Melvar commented Feb 26, 2018

Also note that primitives can’t take or return values of non-primitive types, and that the C FFI also only supports a limited range of types which doesn’t include pairs, so neither can return a pair.

@ahmadsalim
Copy link

@22459 A solution would be to return a pointer to a C struct, and provide methods to access the string and number of read bytes.

@ghost
Copy link
Author

ghost commented Feb 27, 2018

@Melvar Yeah I seem to have run into that problem when I couldn't figure out how to return a tuple. :)
That seems like the most natural way to deal with this though?
So there would have to be an other layer between usage and underlying implementation to clean up the interface.

If I have time I'll try to poke at this on and off a bit the next few days...

@ghost
Copy link
Author

ghost commented Feb 27, 2018

Why cant primitives return nonprimitives?

@ahmadsalim
Copy link

Thanks for reporting the issue. @22459 Primtives can not return non-primitives, because I believe this is a bootstrapping issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants