Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Min compaction size #2346

Merged
merged 9 commits into from
Jun 20, 2024
Merged

[ENH] Min compaction size #2346

merged 9 commits into from
Jun 20, 2024

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented Jun 15, 2024

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • I overhauled the property tests for the log service. There were bugs and also it was underspecc'ed
    • The main bug was the the len() being used was over collectionData not of the collectionData[C] in question.
    • I added invariants using rapids "" action, which is the way to add invariants.
    • The invariants check the log is as we expect and also the the collections we expect get returned
    • I actually use the enumerationoffset in the model now
    • The model will actually purge the log it has.
    • The prop test will check the fields of its records for equality
  • New functionality
    • This PR introduces the "Min compaction size" argument on GetCollectionsToCompact, making it so that the compactors can skip logs with only a handful of entries.
    • This min compaction size is config for the compactor
    • I updated the rust in memory log to respect the min size.
    • The property test is extended to check that min_compaction_size works.

Other notes:

  • The log service types all use int where they should be uint. I added a cleanup task to go convert the incorrect types. The new type I introduce is correct.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Collaborator Author

HammadB commented Jun 15, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @HammadB and the rest of your teammates on Graphite Graphite

@HammadB HammadB force-pushed the hammad/min_compaction_size branch from 43e1841 to a65c626 Compare June 17, 2024 20:42
@HammadB HammadB marked this pull request as ready for review June 17, 2024 21:21
@@ -1,7 +1,7 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.34.2
// protoc v5.26.1
// protoc-gen-go v1.33.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the same version of protoc and proto-gen-go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the version in our go.mod - https:/chroma-core/chroma/blob/main/go/go.mod#L24. Would you prefer I update the version? These files were previously incorrect according to this relationship.

go/database/log/db/copyfrom.go Show resolved Hide resolved
chromadb/proto/chroma_pb2_grpc.py Show resolved Hide resolved
go/pkg/log/repository/log.go Show resolved Hide resolved
go/pkg/log/server/property_test.go Outdated Show resolved Hide resolved
go/pkg/log/server/property_test.go Show resolved Hide resolved
go/pkg/log/server/property_test.go Show resolved Hide resolved
go/pkg/log/server/property_test.go Show resolved Hide resolved
go/pkg/log/server/property_test.go Outdated Show resolved Hide resolved
@HammadB HammadB enabled auto-merge (squash) June 20, 2024 21:52
@HammadB HammadB merged commit add3443 into main Jun 20, 2024
65 checks passed
Anush008 pushed a commit to Anush008/chroma that referenced this pull request Jun 27, 2024
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- I overhauled the property tests for the log service. There were bugs
and also it was underspecc'ed
- The main bug was the the len() being used was over collectionData not
of the collectionData[C] in question.
- I added invariants using rapids "" action, which is the way to add
invariants.
- The invariants check the log is as we expect and also the the
collections we expect get returned
	 - I actually use the enumerationoffset in the model now
	 - The model will _actually_ purge the log it has.
	 - The prop test will check the fields of its records for equality
 - New functionality
- This PR introduces the "Min compaction size" argument on
GetCollectionsToCompact, making it so that the compactors can skip logs
with only a handful of entries.
	 - This min compaction size is config for the compactor
	 - I updated the rust in memory log to respect the min size.
- The property test is extended to check that min_compaction_size works.

Other notes:
- The log service types all use int where they should be uint. I added a
cleanup task to go convert the incorrect types. The new type I introduce
is correct.

## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants