Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(filemanager): ensure events are ingested in the correct order #93

Merged
merged 30 commits into from
Feb 16, 2024

Conversation

mmalenic
Copy link
Member

@mmalenic mmalenic commented Feb 5, 2024

Closes #73

There's still a bit of repetitive code in the application logic and queries, but I think for now it's okay. It will be interesting to benchmark these queries eventually, especially within the update_reordered_for_*.sql queries because the select statement locks all rows with the same bucket, key and version_id. The update_reordered_for_*.sql queries are a bit long, but I think it's okay because it reduces the number of requests that the Lambda function needs to make to the database. Potentially the update and insert queries could be merged to reduce code repetition, with a flag variable indicating if it is a created or deleted event.

Changes

  • Make sure that events are ingested in the correct order by using the created_sequencer and deleted_sequencer values in the database table. The process is:
    • For each event, check if the sequencer condition is met, where there is a sequencer value that better matches any already ingested event with the same bucket, key and version id.
    • Then, replace the already ingested event with the new one, and return the old event to be re-ingested, as it could belong to another object.
    • If there is no out of order event, then just insert the event normally.
    • This process allows the database to correct any ordering issues as it receives events, where the current state of the database represents the best known order of all events received.
  • Move the size and checksum from object to s3_object because this simplifies the reordering logic.
  • Add various tests:
    • More ordering and duplication tests.
    • Tests for the queries themselves (rather than inside the ingest function).
    • Long permutation test to make sure event reordering works correctly. This could also be rewritten as a benchmark or performance test.

@mmalenic mmalenic self-assigned this Feb 5, 2024
@mmalenic mmalenic added filemanager an issue relating to the filemanager feature New feature labels Feb 5, 2024
@victorskl
Copy link
Member

I will review in tick, soon

Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very detailed impl, Marko. Kudos.! Can't wait to see this in action.

@mmalenic mmalenic merged commit 62c3616 into main Feb 16, 2024
2 checks passed
@mmalenic mmalenic deleted the feat/reorder-events branch February 16, 2024 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature filemanager an issue relating to the filemanager
Projects
No open projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

filemanager: ensure S3 events are in order
3 participants