Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of aggregations over a key range using range annotations #3793

Open
anish-shanbhag opened this issue Jul 26, 2024 · 0 comments

Comments

@anish-shanbhag
Copy link
Contributor

anish-shanbhag commented Jul 26, 2024

#3759 introduces range annotations, which compute a query that aggregates a value across a key range within a level. Here are some potential use cases for this feature:

Jira issue: PEBBLE-227

anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within
within a level. Level-wide annotations are now computed internally as a
range annotation with a key range spanning the whole level. Range annotations
use the same B-tree caching behavior as regular annotations, so queries
remain fast even with thousands of tables because they avoid a sequential
iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  232282	      4716 ns/op	     112 B/op	       7 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2110	    545482 ns/op	     400 B/op	       9 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within
within a level. Level-wide annotations are now computed internally as a
range annotation with a key range spanning the whole level. Range annotations
use the same B-tree caching behavior as regular annotations, so queries
remain fast even with thousands of tables because they avoid a sequential
iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  232282	      4716 ns/op	     112 B/op	       7 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2110	    545482 ns/op	     400 B/op	       9 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Jul 26, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within
within a level. Level-wide annotations are now computed internally as a
range annotation with a key range spanning the whole level. Range annotations
use the same B-tree caching behavior as regular annotations, so queries
remain fast even with thousands of tables because they avoid a sequential
iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  232282	      4716 ns/op	     112 B/op	       7 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2110	    545482 ns/op	     400 B/op	       9 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 8, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within
within a level. Level-wide annotations are now computed internally as a
range annotation with a key range spanning the whole level. Range annotations
use the same B-tree caching behavior as regular annotations, so queries
remain fast even with thousands of tables because they avoid a sequential
iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  232282	      4716 ns/op	     112 B/op	       7 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2110	    545482 ns/op	     400 B/op	       9 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  306010	      4015 ns/op	      48 B/op	       6 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2223	    513519 ns/op	     336 B/op	       8 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  306010	      4015 ns/op	      48 B/op	       6 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2223	    513519 ns/op	     336 B/op	       8 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  306010	      4015 ns/op	      48 B/op	       6 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2223	    513519 ns/op	     336 B/op	       8 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See cockroachdb#3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  306010	      4015 ns/op	      48 B/op	       6 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2223	    513519 ns/op	     336 B/op	       8 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: cockroachdb#3793
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: cockroachdb#3793
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 13, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: cockroachdb#3793
anish-shanbhag added a commit that referenced this issue Aug 14, 2024
This change adds a "range annotation" feature to Annotators , which
are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.

This PR only sets up range annotations without changing any existing
behavior. See #3793 for some potential use cases.

`BenchmarkNumFilesRangeAnnotation` shows that range annotations are
significantly faster than using `version.Overlaps` to aggregate over
a key range:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
BenchmarkNumFilesRangeAnnotation/annotator-10         	  306010	      4015 ns/op	      48 B/op	       6 allocs/op
BenchmarkNumFilesRangeAnnotation/overlaps-10          	    2223	    513519 ns/op	     336 B/op	       8 allocs/op
```
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 14, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: cockroachdb#3793
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this issue Aug 26, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: cockroachdb#3793
anish-shanbhag added a commit that referenced this issue Aug 27, 2024
This change updates `db.EstimateDiskUsage` to use range annotations to
estimate the disk usage of a key range. This should improve the
performance of repeated disk usage estimates for similar or identical
key ranges.

At the Cockroach layer we use `db.EstimateDiskUsage` in a few places,
most notably when [computing MVCC span stats](https:/cockroachdb/cockroach/blob/master/pkg/server/span_stats_server.go#L217).

Informs: #3793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

1 participant