Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent errors when reading NDJSON with bad last line #4537

Open
philrz opened this issue Apr 24, 2023 · 1 comment · Fixed by #5055
Open

Inconsistent errors when reading NDJSON with bad last line #4537

philrz opened this issue Apr 24, 2023 · 1 comment · Fixed by #5055

Comments

@philrz
Copy link
Contributor

philrz commented Apr 24, 2023

Repro is with Zed commit d599839.

The attached NDJSON test data files lines-9.ndjson.gz and lines-10.ndjson.gz both consist of several lines of valid NDJSON and a closing incomplete line:

{"syntaxerror

They otherwise only differ in that lines-10 contains one more valid NDJSON record than lines-9 before that bad last line.

Reading both with zq, the reported errors differ.

$ zq -version
Version: v1.7.0-35-gd5998393

$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break

$ zq -z lines-10.ndjson
lines-10.ndjson: EOF

The difference becomes a little more significant when loading to a pool, since no error at all is reported for line-10.

$ zed -use foo load lines-9.ndjson
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2OtB1htK17ZmmjRhXGdwCbXpoPp/branch/main": parse error: string literal: unescaped line break

$ zed -use foo load lines-10.ndjson
(1/1) 4134B/4134B 4134B/s 100.00%
2OtBn63BBXfmUMaoGYmM1nQSCrp committed

$ echo $?
0
@philrz philrz linked a pull request Mar 13, 2024 that will close this issue
@philrz
Copy link
Contributor Author

philrz commented Mar 13, 2024

The fixes in #5055 have significantly improved the errors shown here. Repeating the original repro steps with Zed commit 38763f8, we now see:

$ zq -version
Version: v1.14.0-16-g38763f82

$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break

$ zq -z lines-10.ndjson
lines-10.ndjson: unexpected end of JSON input

@mattnibs explains in #5055 (comment) why we saw the improvement here for lines-10.ndjson but not lines-9.ndjson.

this is a separate issue since in the example of lines-9.ndjson zq is choosing the zsonio reader which is where the error is coming from. [...] if I run it with the json reader specified [...] I get the expected error message.

Indeed this is the case.

$ zq -i json lines-9.ndjson
lines-9.ndjson: unexpected end of JSON input

The improvements are similar for zed load.

$ zed load -use foo lines-9.ndjson 
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": parse error: string literal: unescaped line break

$ zed load -use foo -i json lines-9.ndjson 
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input

$ zed load -use foo lines-10.ndjson 
(1/1) 4134B/4134B 4134B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input

Since auto-detect is likely to be where most users start from, I'll hold this issue open in hopes we can one day do something about the part of this that @mattnibs attributes to the zsonio reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant