Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7104] Fixing cleaner savepoint interplay to fix edge case with incremental cleaning #10651

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Feb 11, 2024

Change Logs

There are chances that incremental cleaner might miss to clean up some partitions which was part of previously savepointed commit. This patch fixes the gap.

  • Each Clean commit metadata is going to track the list of savepoints as of its planning. Subsequent clean planner, will find any savepoints that was deleted and include the partitions during planning that was part of them.

Impact

Incremental cleaner will not miss to clean up older partitions which was touched by a recently removed savepoint but is not updated with regular writes.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan nsivabalan changed the title [DNM] Draft for savepoint cleaner fix [HUDI-7104] Fixing cleaner savepoint interplay to fix edge case with incremental cleaning Feb 12, 2024
@nsivabalan nsivabalan force-pushed the savepointCleanerFix branch 2 times, most recently from d13ebd4 to a62585a Compare February 12, 2024 00:14
@nsivabalan nsivabalan marked this pull request as ready for review February 12, 2024 00:14
@nsivabalan
Copy link
Contributor Author

image

@nsivabalan
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit f29811b into apache:master Feb 15, 2024
31 checks passed
yihua pushed a commit that referenced this pull request Feb 27, 2024
…incremental cleaning (#10651)

* Fixing incremental cleaning with savepoint

* Addressing feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants