Skip to content

Commit

Permalink
[ENH] Min compaction size (chroma-core#2346)
Browse files Browse the repository at this point in the history
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- I overhauled the property tests for the log service. There were bugs
and also it was underspecc'ed
- The main bug was the the len() being used was over collectionData not
of the collectionData[C] in question.
- I added invariants using rapids "" action, which is the way to add
invariants.
- The invariants check the log is as we expect and also the the
collections we expect get returned
	 - I actually use the enumerationoffset in the model now
	 - The model will _actually_ purge the log it has.
	 - The prop test will check the fields of its records for equality
 - New functionality
- This PR introduces the "Min compaction size" argument on
GetCollectionsToCompact, making it so that the compactors can skip logs
with only a handful of entries.
	 - This min compaction size is config for the compactor
	 - I updated the rust in memory log to respect the min size.
- The property test is extended to check that min_compaction_size works.

Other notes:
- The log service types all use int where they should be uint. I added a
cleanup task to go convert the incorrect types. The new type I introduce
is correct.

## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
  • Loading branch information
HammadB authored and Anush008 committed Jun 27, 2024
1 parent b1da2d2 commit b50b1c1
Show file tree
Hide file tree
Showing 26 changed files with 574 additions and 258 deletions.
4 changes: 3 additions & 1 deletion chromadb/proto/chroma_pb2_grpc.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion chromadb/proto/coordinator_pb2_grpc.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 10 additions & 10 deletions chromadb/proto/logservice_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions chromadb/proto/logservice_pb2.pyi

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion chromadb/proto/logservice_pb2_grpc.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion go/database/log/db/copyfrom.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion go/database/log/db/db.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 1 addition & 3 deletions go/database/log/db/models.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions go/database/log/db/queries.sql.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions go/database/log/queries/queries.sql
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ with summary as (
select r.collection_id, r.offset, r.timestamp, row_number() over(partition by r.collection_id order by r.offset) as rank
from record_log r, collection c
where r.collection_id = c.id
and (c.record_enumeration_offset_position - c.record_compaction_offset_position) >= sqlc.arg(min_compaction_size)
and r.offset > c.record_compaction_offset_position
)
select * from summary
Expand Down
7 changes: 4 additions & 3 deletions go/pkg/log/repository/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ package repository
import (
"context"
"errors"
"time"

log "github.com/chroma-core/chroma/go/database/log/db"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
"time"
)

type LogRepository struct {
Expand Down Expand Up @@ -76,8 +77,8 @@ func (r *LogRepository) PullRecords(ctx context.Context, collectionId string, of
return
}

func (r *LogRepository) GetAllCollectionInfoToCompact(ctx context.Context) (collectionToCompact []log.GetAllCollectionsToCompactRow, err error) {
collectionToCompact, err = r.queries.GetAllCollectionsToCompact(ctx)
func (r *LogRepository) GetAllCollectionInfoToCompact(ctx context.Context, minCompactionSize uint64) (collectionToCompact []log.GetAllCollectionsToCompactRow, err error) {
collectionToCompact, err = r.queries.GetAllCollectionsToCompact(ctx, int64(minCompactionSize))
if collectionToCompact == nil {
collectionToCompact = []log.GetAllCollectionsToCompactRow{}
}
Expand Down
Loading

0 comments on commit b50b1c1

Please sign in to comment.