Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metrics generator processor to store a local copy of traces #2368

Merged
merged 6 commits into from
May 8, 2023

Conversation

mdisibio
Copy link
Contributor

What this PR does:
This PR is part of a wider initiative to perform on-demand metrics calculations. It adds a new processor local-blocks that stores a copy of all received traces on disk. It works almost identically to an ingester with a live traces map, wal, local blocks, retention period, and with similar configuration. Currently this serves no purpose but more functionality will be added in later PRs.

The metrics generator config looks like:

metrics-generator:
  processor:
    local_blocks:
       block: (tempodb block config, optional)
       flush_check_period: (duration, optional)
       trace_idle_period: (duration, optional)
       max_block_duration: (duration, optional)
       max_block_bytes: (optional)
       complete_block_timeout: (optional)

  traces_storage:
    path: /tmp/tempo/generator/traces (required) # the only required field
    version: (optional)

overrides:
  metrics_generator_processors: [local-blocks]  # Enable the processor

Notes:

  1. This forks a lot of code from the ingester for live traces, flushing, deleting, etc. I originally hoped to call the ingester module but it's not prepared for that and would need a combination of updates (exporting symbols, making parts like flushing to object storage disable-able). So forking was not only more straightforward but also a chance to simplify the code.
  2. Flushing/completing/deleting is done via goroutines per instance. The real ingester has a shared set of workers and work queue across all instances. That approach should be adopted here but unsure yet how to make it work with the processors, since they can be dynamically configured, enabled, and disabled.
  3. This tries to reuse config structs and defaults where possible, but counter-intuitively it doesn't reuse actual same settings as the rest of Tempo. There are a few reasons: they are separate modules and probably should have separate config. This processor's storage has to be distinct from the main storage area since the data is a little different. Real storage will be RF3 (if configured), but the metrics-generator traffic is best-effort and only RF1.

Additional changes:

  1. I think this should fix the "duplication registration" bug that sometimes happens in metrics generator.
  2. Relocates some config defaults to be callable by the generator module.

Which issue(s) this PR fixes:
Fixes n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@knylander-grafana
Copy link
Contributor

Will we need to update the metrics-generator docs for this?

@mdisibio mdisibio marked this pull request as ready for review May 3, 2023 13:21
Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a couple comments. Nice work.

modules/generator/processor/localblocks/config.go Outdated Show resolved Hide resolved
walBlocks: map[uuid.UUID]common.WALBlock{},
completeBlocks: map[uuid.UUID]common.BackendBlock{},
liveTraces: newLiveTraces(),
closeCh: make(chan struct{}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking outloud, not blocking; do we prefer a close chan over a context cancellation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the chose channel for simplicity, but you make a good point that they have different behaviors. Right now the processor would finish any in-progress flush or block conversion before shutting down, whereas context cancellation would immediately break out because it would be passed along to the i/o operations.

@mdisibio mdisibio requested a review from ie-pham as a code owner May 5, 2023 12:17
@mdisibio
Copy link
Contributor Author

mdisibio commented May 8, 2023

Will we need to update the metrics-generator docs for this?

@knylander-grafana Yes we will need new docs for this processor and the API coming in later PRs.

@mdisibio mdisibio merged commit 05ea80d into grafana:main May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants