-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance of aggregations over a key range using range annotations #3793
Labels
Comments
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within within a level. Level-wide annotations are now computed internally as a range annotation with a key range spanning the whole level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 232282 4716 ns/op 112 B/op 7 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2110 545482 ns/op 400 B/op 9 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within within a level. Level-wide annotations are now computed internally as a range annotation with a key range spanning the whole level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 232282 4716 ns/op 112 B/op 7 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2110 545482 ns/op 400 B/op 9 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within within a level. Level-wide annotations are now computed internally as a range annotation with a key range spanning the whole level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 232282 4716 ns/op 112 B/op 7 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2110 545482 ns/op 400 B/op 9 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 8, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within within a level. Level-wide annotations are now computed internally as a range annotation with a key range spanning the whole level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 232282 4716 ns/op 112 B/op 7 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2110 545482 ns/op 400 B/op 9 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: cockroachdb#3793
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: cockroachdb#3793
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: cockroachdb#3793
anish-shanbhag
added a commit
that referenced
this issue
Aug 14, 2024
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See #3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 14, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: cockroachdb#3793
anish-shanbhag
added a commit
to anish-shanbhag/pebble
that referenced
this issue
Aug 26, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: cockroachdb#3793
anish-shanbhag
added a commit
that referenced
this issue
Aug 27, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to estimate the disk usage of a key range. This should improve the performance of repeated disk usage estimates for similar or identical key ranges. At the Cockroach layer we use `db.EstimateDiskUsage` in a few places, most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217). Informs: #3793
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#3759 introduces range annotations, which compute a query that aggregates a value across a key range within a level. Here are some potential use cases for this feature:
Calculating the number of keys shadowed by a tombstone-dense key range, for use in the heuristic proposed at docs/rfc: add RFC for point tombstone density compaction heuristic #3719db.ScanStatistics
, which could increase performance significantly over the current implementation where we scan every key in the DB.Jira issue: PEBBLE-227
The text was updated successfully, but these errors were encountered: