Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanf behaviour is strange #6505

Closed
yuantailing opened this issue May 4, 2018 · 7 comments · Fixed by #9476
Closed

scanf behaviour is strange #6505

yuantailing opened this issue May 4, 2018 · 7 comments · Fixed by #9476

Comments

@yuantailing
Copy link

yuantailing commented May 4, 2018

scanf behaviour is strange.

main.cpp: read at most 200 numbers and print them.

#include <cstdio>

using namespace std;

int main() {
    for (int i = 0; i < 200; i++) {
        int x;
        int ret = scanf("%d", &x);
        if (ret == EOF)
            break;
        printf("%d ", x);
        if ((i + 1) % 10 == 0)
            printf("\n");
    }
    return 0;
}

Input (in.txt):

0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
100 101 102 103 104 105 106 107 108 109
110 111 112 113 114 115 116 117 118 119
120 121 122 123 124 125 126 127 128 129
130 131 132 133 134 135 136 137 138 139
140 141 142 143 144 145 146 147 148 149

Compile with default options:

user@host:~$ em++ main.cpp

Output is incorrect when using bash redirection:

user@host:~$ node a.out.js <in.txt
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 880 1
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39 40 41
42 43 44 45 46 47 48 49 50 51
52 53 54 55 56 57 58 59 60 61
62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81
82 83 84 85 86 87 880 1 2 3
4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23

Output is correct when using pipe:

user@host:~$ cat in.txt | node a.out.js
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
100 101 102 103 104 105 106 107 108 109
110 111 112 113 114 115 116 117 118 119
120 121 122 123 124 125 126 127 128 129
130 131 132 133 134 135 136 137 138 139
140 141 142 143 144 145 146 147 148 149

My environment:

user@host:~$ em++ --version
emcc (Emscripten gcc/clang-like replacement) 1.37.39 (commit 63ccba9f6307da3fcccaa9324252da54f4cba505)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
user@host:~$ node --version
v8.9.1
user@host:~$ bash --version
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
@kripken
Copy link
Member

kripken commented May 7, 2018

I am surprised this works at all - we don't seem to have any special code to handle stdin in src/shell.js - it's something that needs to be done manually (to define Module['stdin']). So I must be missing something here.

(I'm also unsure about the bash differences between pipes and redirection, I would have guessed both work as pipes internally...)

@stale
Copy link

stale bot commented Sep 19, 2019

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Sep 19, 2019
@yuantailing
Copy link
Author

The problem still exists in current version.

emcc (Emscripten gcc/clang-like replacement) 1.38.45 (commit 252410a)

@stale stale bot removed the wontfix label Sep 19, 2019
@bvibber
Copy link
Collaborator

bvibber commented Sep 21, 2019

Underlying code seems to use get_char which on Node on Linux/Mac is opening /dev/stdin for read, reading up to 256 bytes, and buffering those for read. There then seems to be some different behavior between node a.out.js < in.txt and cat in.txt | node a.out.js at the lower levels of the stack -- the redirection case feeds in a full 256 bytes on the first read, then the second read reads the first 256 bytes again for a total of four 256-byte reads. We only scan through the first and second, but both return the first 256 bytes given! On the pipe case, it correctly fetches 256 bytes then the following 234 bytes, and scans the expected input data.

@bvibber
Copy link
Collaborator

bvibber commented Sep 21, 2019

I can repro with this extracted bit on node:

var fs = require('fs');

function get_chars() {
    // we will read data by chunks of BUFSIZE
    var BUFSIZE = 256;
    var buf = Buffer.alloc ? Buffer.alloc(BUFSIZE) : new Buffer(BUFSIZE);
    var bytesRead = 0;

    var isPosixPlatform = (process.platform != 'win32'); // Node doesn't offer a direct check, so test by exclusion

    var fd = process.stdin.fd;
    if (isPosixPlatform) {
        // Linux and Mac cannot use process.stdin.fd (which isn't set up as sync)
        var usingDevice = false;
        try {
            fd = fs.openSync('/dev/stdin', 'r');
            usingDevice = true;
        } catch (e) {}
    }

    try {
        bytesRead = fs.readSync(fd, buf, 0, BUFSIZE, null);
    } catch(e) {
    // Cross-platform differences: on Windows, reading EOF throws an exception, but on other OSes,
    // reading EOF returns 0. Uniformize behavior by treating the EOF exception to return 0.
    if (e.toString().indexOf('EOF') != -1) bytesRead = 0;
        else throw e;
    }

    if (usingDevice) { fs.closeSync(fd); }
    if (bytesRead > 0) {
        result = buf.slice(0, bytesRead).toString('utf-8');
    } else {
        result = null;
    }
    return result;
}

for (var i = 0; i < 2; i++) {
    let s = get_chars();
    console.log({s});
}

Second get_chars() call buffers up the beginning of input again instead of the second chunk as expected, but only when using file redirection and not when using a pipe.

@bvibber
Copy link
Collaborator

bvibber commented Sep 21, 2019

I get the same 'restarting' behavior with a C program on Linux, so I think the code is just not using /dev/stdin in the way it's intended to be used?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 21, 2019

If i remove the isPosixPlatform this seems to work as expected.. so perhaps we can simply remove that workaround?

sbc100 added a commit that referenced this issue Sep 21, 2019
sbc100 added a commit that referenced this issue Sep 21, 2019
sbc100 added a commit that referenced this issue Sep 22, 2019
belraquib pushed a commit to belraquib/emscripten that referenced this issue Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants