Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet is missing rows #81

Closed
mariussoutier opened this issue Jun 28, 2022 · 6 comments
Closed

Parquet is missing rows #81

mariussoutier opened this issue Jun 28, 2022 · 6 comments

Comments

@mariussoutier
Copy link

I have a Parquet file that should have 30,000+ rows, but SELECT COUNT(*) FROM {} returns 7000. Another one with more than 40,000 rows returns exactly 8000. Converting the same data to JSON works fine.

@eatonphil
Copy link
Member

Thanks for the report! Can you share a parquet file that has this issue?

@mariussoutier
Copy link
Author

Unfortunately no, it's business-related. But nothing special, 30 or so columns with mostly UTF8 and two INT32 types.

@mariussoutier
Copy link
Author

One of column does contain very large values, but other than that, normal stuff.

@eatonphil
Copy link
Member

Sounds like @Sajuno reproduced the issue and is thinking of a fix. Thanks @Sajuno!

@Sajuno
Copy link
Contributor

Sajuno commented Jun 28, 2022

@mariussoutier multiprocessio/datastation#278 should fix it!

@eatonphil
Copy link
Member

Closed in #82 now available in dsq 0.21.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants