Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly decode percent-encoded file paths coming from parquet checkpoints #1970

Merged
merged 2 commits into from
Dec 18, 2023

Conversation

sigorbor
Copy link
Contributor

@sigorbor sigorbor commented Dec 14, 2023

Description

When read from parquet checkpoints, the Add and Remove file paths are not percent-decoded, in contrary to when read from JSON transaction logs. That causes file paths mismatch (e.g. percent-encoded Add file path is read from checkpoint while the same tombstone path is read from JSON), and can result in tombstone files being counted as active.

…oming from parquet checkpoints, to prevent tombstone and file paths mismatch (e.g. file path is read from checkpoint while tombstone path is read from JSON)
@github-actions github-actions bot added binding/rust Issues for the Rust crate crate/core labels Dec 14, 2023
Comment on lines -65 to -80
#[test]
fn test_encode_path() {
let cases = [
(
"string=$%25&%2F()%3D%5E%22%5B%5D%23%2A%3F.%3A/part-00023-4b06bc90-0678-4a63-94a2-f09af1adb945.c000.snappy.parquet",
"string=$%2525&%252F()%253D%255E%2522%255B%255D%2523%252A%253F.%253A/part-00023-4b06bc90-0678-4a63-94a2-f09af1adb945.c000.snappy.parquet",
),
(
"string=$%25&%2F()%3D%5E%22<>~%5B%5D%7B}`%23|%2A%3F%2F%5Cr%5Cn.%3A/part-00023-e0a68495-8098-40a6-be5f-b502b111b789.c000.snappy.parquet",
"string=$%2525&%252F()%253D%255E%2522%3C%3E~%255B%255D%257B%7D%60%2523%7C%252A%253F%252F%255Cr%255Cn.%253A/part-00023-e0a68495-8098-40a6-be5f-b502b111b789.c000.snappy.parquet"
),
(
"string=$%25&%2F()%3D%5E%22<>~%5B%5D%7B}`%23|%2A%3F%2F%5Cr%5Cn.%3A_-/part-00023-346b6795-dafa-4948-bda5-ecdf4baa4445.c000.snappy.parquet",
"string=$%2525&%252F()%253D%255E%2522%3C%3E~%255B%255D%257B%7D%60%2523%7C%252A%253F%252F%255Cr%255Cn.%253A_-/part-00023-346b6795-dafa-4948-bda5-ecdf4baa4445.c000.snappy.parquet"
)
];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are losing test coverage here. Why are we deleting this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is duplicated in src\kernel\actions\serde_path.rs, including the test. Likely a leftover.

@rtyler rtyler enabled auto-merge (rebase) December 18, 2023 00:07
@rtyler rtyler merged commit 763d39e into delta-io:main Dec 18, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate crate/core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants