Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLN] Reorganize delta module into seperate module and split out impls and [PERF] Refactor bf get_size to avoid nested loops #2674

Merged
merged 12 commits into from
Aug 22, 2024

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented Aug 16, 2024

Description of changes

Summarize the changes made by this PR.

  • This PR is just cleanup and reorganization
  • It moves all ArrowWriteableKey implementations into /key, values into /value and delta storage and related into /delta.
  • It cleans up many warnings
  • It fixes different visibility issues

stacked with #2684

Test plan

How are these changes tested?
Existing tests cover this change

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Collaborator Author

HammadB commented Aug 16, 2024

@HammadB HammadB marked this pull request as ready for review August 21, 2024 20:29
Copy link
Contributor

@sanketkedia sanketkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning this up!

@@ -52,14 +52,6 @@ impl BlockDelta {
V::delete(prefix, key.into(), self)
}

/// Gets the minimum key in the block delta.
pub fn get_min_key(&self) -> Option<CompositeKey> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this not being used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, but i added a flavor back in stacked pr

rust/blockstore/src/arrow/block/delta/int32.rs Outdated Show resolved Hide resolved
@@ -377,11 +377,6 @@ impl SparseIndexManager {
}
}

pub fn create(&self, id: &Uuid) -> SparseIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not being used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will clean up separately, as it was not added in this PR and I don't want to retrigger tests

## Description of changes

*Summarize the changes made by this PR.*
- This PRs main intent is to make get_size() and sizing related
operations on delta O(1) instead of O(n). The O(n) behavior was leading
to a slow insert time since every call resulted in a get_size().
- Refactored uint32, string, int32 deltas into a single_column_storage
delta that is generic over value types and uses a size tracker helper
struct.
- Refactored DataRecord storage to use a single map instead of one per
column, and also one lock
- Refactored both deltas to track their sizes, handling overwrites and
deletes
- Moved the add logic OUT of ArrowWriteableImplementations, and made
them dumb proxies for type inversion
- Changed several value types to be written with ownership, they are
copied under the hood anyways, and our general pattern is to put the
onus of cloning on the caller. Int32Arrays are Arc’ed, RoaringBitmap
needed to be copied etc.
- Handled a stray TODO on DataRecord metadata sizing
- Pushed splitting into the delta storage, simplifying the delta layer
and getting better use of generics


## Test plan
*How are these changes tested?*
This change needs more tests, I plan to stack some more tests on this
PR. Exisitng tests cover add path but but not post-delete sizing.
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
@HammadB HammadB changed the title [CLN] Reorganize delta module into seperate module and split out impls [CLN] Reorganize delta module into seperate module and split out impls and [PERF] Refactor bf get_size to avoid nested loops Aug 22, 2024
@HammadB HammadB enabled auto-merge (squash) August 22, 2024 06:36
@HammadB HammadB merged commit 003bf67 into main Aug 22, 2024
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants