-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest: improve Annotator interface with generics #3760
manifest: improve Annotator interface with generics #3760
Conversation
Annotator
interface with genericsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Much cleaner!
Reviewed 5 of 11 files at r1.
Reviewable status: 5 of 11 files reviewed, 4 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)
-- commits
line 21 at r1:
do you know where this additional allocation is coming from? A heap profile go test -run XXX -bench BenchmarkNumFilesAnnotator -memprofile mem.prof
should help
compaction_picker.go
line 1390 at r1 (raw file):
// TODO(travers): Consider an alternative heuristic for elision of range-keys. deletionCriteria := f.Stats.RangeDeletionsBytesEstimate*10 >= f.Size || f.Stats.NumDeletions*10 > f.Stats.NumEntries return !f.IsCompacting() && f.StatsValid() && deletionCriteria, f.StatsValid()
nit: I think the previous structure with an early exit if !f.StatsValid()
was a little clearer—the stats we use in computing deletionCriteria
might not be initialized at all, and it's less obvious that we're not improperly using them
internal/manifest/annotator.go
line 75 at r1 (raw file):
n.annot = append(n.annot, annotation{ annotator: a, vptr: new(T),
The annotations where T = *fileMetadata
end up double boxing (storing a **fileMetadata
in vptr
). Maybe the responsibility of boxing the value should be the responsibility of whoever defines T
(eg, T
should always be a pointer type to avoid an allocation). Then the PickFileAnnotator
continues to use T = *fileMetadata
, boxing once.
internal/manifest/annotator.go
line 234 at r1 (raw file):
return f1 } return f2
nit: worth using a switch:
switch {
case f1 == nil:
return f2
case f2 == nil:
return f1
case fa.Compare(f1, f2):
return f1
default:
return f2
}
b0d03eb
to
40de423
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated benchmark results:
goos: darwin
goarch: arm64
pkg: github.com/cockroachdb/pebble/internal/manifest
│ old │ new │
│ sec/op │ sec/op vs base │
NumFilesAnnotator-10 1.635µ ± 1% 1.572µ ± 7% ~ (p=0.065 n=6)
│ old │ new │
│ B/op │ B/op vs base │
NumFilesAnnotator-10 536.0 ± 0% 536.0 ± 0% ~ (p=1.000 n=6) ¹
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NumFilesAnnotator-10 7.000 ± 0% 7.000 ± 0% ~ (p=1.000 n=6) ¹
¹ all samples are equal
Reviewable status: 3 of 11 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)
Previously, jbowens (Jackson Owens) wrote…
do you know where this additional allocation is coming from? A heap profile
go test -run XXX -bench BenchmarkNumFilesAnnotator -memprofile mem.prof
should help
Spent a while trying to find the allocation in the Annotator, but turns out this was actually because require.EqualValues
allocated a new uint64
when two different types were passed for comparison (in this case int
and uint64
. This is fixed now; I also added the updated benchmark results above.
compaction_picker.go
line 1390 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: I think the previous structure with an early exit if
!f.StatsValid()
was a little clearer—the stats we use in computingdeletionCriteria
might not be initialized at all, and it's less obvious that we're not improperly using them
Makes sense, I've reverted this to more closely match with the structure we had before.
internal/manifest/annotator.go
line 75 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
The annotations where
T = *fileMetadata
end up double boxing (storing a**fileMetadata
invptr
). Maybe the responsibility of boxing the value should be the responsibility of whoever definesT
(eg,T
should always be a pointer type to avoid an allocation). Then thePickFileAnnotator
continues to useT = *fileMetadata
, boxing once.
Agreed that this is not ideal. That said, I'm curious what the potential downsides are (performance, edge cases, etc.) for the double boxing here. One of my goals was to abstract away the need for pointer manipulation entirely, making annotators like SumAnnotator
more straightforward to implement. If we place the responsibility of boxing on whoever defines T
, then they also have implement the initial allocation in Zero
and merging into an existing value in Accumulate
(i.e. very similar to the previous implementation, minus the typecasting).
Ideally there would be a way to distinguish when T
is a pointer type and when it isn't, and avoid the double boxing based on that. But unfortunately, I haven't seen a way to do this with generics yet.
internal/manifest/annotator.go
line 234 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: worth using a switch:
switch { case f1 == nil: return f2 case f2 == nil: return f1 case fa.Compare(f1, f2): return f1 default: return f2 }
Good call - fixed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 11 files reviewed, 2 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)
Previously, anish-shanbhag (Anish Shanbhag) wrote…
Spent a while trying to find the allocation in the Annotator, but turns out this was actually because
require.EqualValues
allocated a newuint64
when two different types were passed for comparison (in this caseint
anduint64
. This is fixed now; I also added the updated benchmark results above.
Ah nice
internal/manifest/annotator.go
line 75 at r1 (raw file):
I think the most significant downside is that double boxing forces a new allocation. Without the additional boxing, we would be storing *filetMetadata
directly as the annotation value without suffering any additional allocations.
If we place the responsibility of boxing on whoever defines
T
, then they also have implement the initial allocation inZero
and merging into an existing value inAccumulate
(i.e. very similar to the previous implementation, minus the typecasting).
IMO, this is worth it. In the case of the *fileMetadata
annotations, there is no initial allocation. The zero value is just a nil pointer. In all the non-nil instances, we'll just be passing around pointers to the already-allocated fileMetadata
.
At least the pointer handling will be encapsulated within the sumAggregator
for those cases.
3cb8d41
to
7777df4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 11 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
internal/manifest/annotator.go
line 75 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think the most significant downside is that double boxing forces a new allocation. Without the additional boxing, we would be storing
*filetMetadata
directly as the annotation value without suffering any additional allocations.If we place the responsibility of boxing on whoever defines
T
, then they also have implement the initial allocation inZero
and merging into an existing value inAccumulate
(i.e. very similar to the previous implementation, minus the typecasting).IMO, this is worth it. In the case of the
*fileMetadata
annotations, there is no initial allocation. The zero value is just a nil pointer. In all the non-nil instances, we'll just be passing around pointers to the already-allocatedfileMetadata
.At least the pointer handling will be encapsulated within the
sumAggregator
for those cases.
Makes sense, I hadn't considered the fact that we avoid the extra allocation here. I've updated to avoid the double boxing.
Refactors `manifest.Annotator` to use generics and a simplified API. This eliminates the need to perform pointer manipulation and unsafe typecasting when defining a new Annotator. The goal of this change is to improve the `Annotator` interface while not changing any existing behavior. `BenchmarkNumFilesAnnotator` shows roughly the same performance as master when compared to the equivalent implementation of `orderStatistic`: ``` pkg: github.com/cockroachdb/pebble/internal/manifest │ old │ new │ │ sec/op │ sec/op vs base │ NumFilesAnnotator-10 1.635µ ± 1% 1.572µ ± 7% ~ (p=0.065 n=6) │ old │ new │ │ B/op │ B/op vs base │ NumFilesAnnotator-10 536.0 ± 0% 536.0 ± 0% ~ (p=1.000 n=6) ¹ ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NumFilesAnnotator-10 7.000 ± 0% 7.000 ± 0% ~ (p=1.000 n=6) ¹ ¹ all samples are equal ```
7777df4
to
a5e666e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, looks great!
Reviewed 2 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
cockroachdb#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. cockroachdb#3759 should already not be affected by this due to the different way it handles aggregation.
cockroachdb#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. cockroachdb#3759 should already not be affected by this due to the different way it handles aggregation.
#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. #3759 should already not be affected by this due to the different way it handles aggregation.
Refactors
manifest.Annotator
to use generics and a simplified API. This eliminates the need to perform pointer manipulation and unsafe typecasting when defining a new Annotator. The goal of this change is to improve theAnnotator
interface while not changing any existing behavior.BenchmarkNumFilesAnnotator
shows roughly the same performance as master when comparedto the equivalent implementation of
orderStatistic
: