Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): skip empty row groups during stats gathering #2172

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

ion-elgreco
Copy link
Collaborator

Description

For some odd reason the pyarrow parquet writer will leave empty row groups in the parquet file when it hits the max_row limit that's passed. While grabbing the stats we were checking if all row_groups were having stats added to them but these empty row groups had no stats so it causes the whole file add action to get no stats recorded.

We now skip empty row groups while gathering the stats to prevent this.

In v0.15.2 we now also evaluate files with no stats mentioned as null @roeap @rtyler not sure if this is entirely correct as well

Related Issue(s)

@github-actions github-actions bot added the binding/python Issues for the Python package label Feb 6, 2024
wjones127
wjones127 previously approved these changes Feb 6, 2024
python/tests/test_writer.py Outdated Show resolved Hide resolved
@ion-elgreco ion-elgreco merged commit 7b668aa into delta-io:main Feb 6, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken filter for newly created delta table
2 participants