Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add GetSampleRateMulti #53

Merged
merged 7 commits into from
Mar 19, 2023
Merged

feat: add GetSampleRateMulti #53

merged 7 commits into from
Mar 19, 2023

Conversation

kentquirk
Copy link
Contributor

@kentquirk kentquirk commented Mar 13, 2023

Which problem is this PR solving?

This library was designed to count spans and to sample based on the assumption that every span processed would get an independent sample rate. But that's not how it's being used by Refinery -- Refinery does its calculations based on traces, so each call to GetSampleRate is likely representative of multiple spans.

Short description of the changes

  • Add GetSampleRateMulti, which takes a count of the number of events that it references, to the Sampler interface
  • Implement GetSampleRate for all samplers in terms of GetSampleRateMulti()
  • Add Stop to the Sampler interface for halting samplers safely
  • Add an EMA test
  • Add a generic sampler test that only uses the public interface including Stop

kentquirk and others added 2 commits March 14, 2023 18:14
## Which problem is this PR solving?

Include a Stop() function so that the samplers can be shut down properly. This is
currently only useful in tests.

## Short description of the changes

- Add stop function.
@kentquirk kentquirk marked this pull request as ready for review March 15, 2023 13:44
@kentquirk kentquirk requested a review from a team as a code owner March 15, 2023 13:44
@kentquirk kentquirk added type: enhancement status: review needed Changes need review. version: bump minor A PR that adds behavior, but is backwards-compatible. labels Mar 15, 2023
Copy link
Contributor

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these changes look like they were in #54, did something strange happen with a rebase?

dynsampler.go Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
windowedthroughput.go Show resolved Hide resolved
@cartermp
Copy link
Member

Refinery does its calculations based on traces, so each call to GetSampleRate is likely representative of multiple spans.

Could you describe an example of how this can help people using Refinery? I...think I get it, but I'm honestly not sure! Would this mean that sample rates are incorrectly reported or inflated today?

@kentquirk
Copy link
Contributor Author

Sample rates in refinery are calculated by asking the samplers to make their calculations once per trace. But there are a variable number of spans per trace, and Honeycomb bills by the span. So when we're sampling to limit throughput, we should be calculating these numbers based on the number of spans in the trace.

When we're sampling based on keeping a distribution of keys, we could do it either based on spans or traces. It's not likely to make a difference unless span length is correlated to key value (possible, but probably not common).

For the EMAThroughput sampler I'm currently working on, controlling throughput while still seeing representative data is the primary goal, so it will definitely be looking at span count.

@kentquirk kentquirk merged commit cb1cc4b into main Mar 19, 2023
@kentquirk kentquirk deleted the kent.event_counts branch March 19, 2023 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: review needed Changes need review. type: enhancement version: bump minor A PR that adds behavior, but is backwards-compatible.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants