-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(new transform): Aggregate transform to reduce metric volume maintaining data #7846
Conversation
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
9bdc6e7
to
f826e3a
Compare
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Seems reasonable to me, but I'll leave the TODO's/questions to others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking pretty good, but a few things are jumping out at the moment.
Signed-off-by: Ross McFarland <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer!
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. There's still the open question about event metadata which I'll get an answer to, but other than that, we just need to nail down the performance/polish.
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes done. There's a couple where there's still outstanding questions before I could do anything with them. I think this is pretty shippable as-is.
src/transforms/aggregate.rs
Outdated
where | ||
Self: 'static, | ||
{ | ||
let mut me = self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/transforms/aggregate.rs
Outdated
metric::MetricKind::Incremental => { | ||
match self.map.get_mut(&series) { | ||
// We already have something, add to it, will update timestamp as well. | ||
Some(existing) => existing.update(data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now emit!
s an internal metric when this happens.
Yeah it makes sense when there's a requirement/limitation downstream as described here, but still don't think I'd want it to be happening implicitly when not forced to.
🆒 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better. Down to a couple of minor tweaks from me.
src/transforms/aggregate.rs
Outdated
} | ||
|
||
fn flush_into(&mut self, output: &mut Vec<Event>) { | ||
if self.map.len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this condition add any value? If the len
is zero, the map drain will be a no-op too. I would also expect clippy to gripe about this should use is_empty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume it's super minor if at all. It was more meaningful back when the code was swapping out maps. Will remove.
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Also rustfmt Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am good with this except for the question about handling update failures.
src/transforms/aggregate.rs
Outdated
if !existing.0.update(&data) { | ||
emit!(AggregateUpdateFailed); | ||
} | ||
existing.1.merge(metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other sinks, when the update fails, we drop the older metric instead of the newer one as is being done here. I think that would be the more sensible behavior, but I am open to your thoughts. Suggestion:
if !existing.0.update(&data) { | |
emit!(AggregateUpdateFailed); | |
} | |
existing.1.merge(metadata); | |
if !existing.0.update(&data) { | |
emit!(AggregateUpdateFailed); | |
*existing = (data, metadata); | |
} else { | |
existing.1.merge(metadata); | |
} |
Note that the assignment causes an implicit drop of the metadata, which is reasonable given that we are also dropping the data as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Didn't know the precedent and in a vacuum either option seemed equally bad so I went with the simpler one. Will update now and add in a test for the behavior. Also removed the !
and flipped the branches to avoid the non-essential not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @bruceg, it looks like MetricData.update
succeeds when the kind changes (glad i tried to write tests for that case) so that code is "add"ing Incremental
and Absolute
values which doesn't really make sense. I looked into the metrics buffer and I think it's checking for type matches before doing the add. Kind of seems like update
should do the add or replace logic based on kind
so that all the users of that method don't have to check and handle that case, but for now I'll rework this to check kind
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Change made in the spirit of your request and the edge cases are now tested and behaving as expected/intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a new pattern so it wouldn't have been obvious.
Good catch, yes update
doesn't check the metric kind. It looks like the logic this transform wants and the logic for the metrics buffer is different enough that I'm not clear what the proper semantics should be. I'll have to give that more thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else it could fail the update, do nothing and return false. Assuming something elsewhere isn't expecting/wanting it to add things with non-matching kind
. That would let aggregate
use it as is. Would have to dig deeper to see what metrics buffer is doing, but I don't think failing the update would hurt it since it would check before sending it and never get there in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking more along the lines of update
possibly taking the new data by value instead of reference and returning it back in the case of an error as Result<(), MetricData>
, to avoid needing to clone data inside update. On the other hand, the metrics buffer wants to be able to do add incrementals to both kinds, while this transform wants to replace for the case of different kinds, so that's hard to shoehorn into one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it's not clear to me how adding Absolute
s makes sense given they're intended to be the value as-is, e.g. if you're going 55mph at t0 and 57mpg at t1, 112mph isn't a meaningful number. It seems like kind
and Counter
/Guage
kind of overlap semantic wise and can result in internally conflicting states.
Anyway, not really related to this PR so I can let it go 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you can't add absolutes, but you can update an absolute to a new absolute value. You can add an incremental to either kind as well. Agreed it's not related to this PR.
…kind and value types Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
95 comments, 36 commits... hooo boy. :) Appreciate you sticking with us on this one. 👍🏻 |
|
||
examples: [ | ||
{ | ||
title: "Aggregate over 15 seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this says 15 seconds but the interval_ms
below is 5000.
The transform combines (
add
's)Incremental
metric events and uses "last one wins" semantics forAggregate
s. This is similar to how statsd works, but it does not (yet) support things like continuing to emit 0's for metrics it's previously seen./cc #676 which requested statsd functionality
/cc #3668 which is functionality that could be built on top of this PR if desired.