New metrics generator processor to store a local copy of traces #2368

mdisibio · 2023-04-25T18:45:04Z

What this PR does:
This PR is part of a wider initiative to perform on-demand metrics calculations. It adds a new processor local-blocks that stores a copy of all received traces on disk. It works almost identically to an ingester with a live traces map, wal, local blocks, retention period, and with similar configuration. Currently this serves no purpose but more functionality will be added in later PRs.

The metrics generator config looks like:

metrics-generator:
  processor:
    local_blocks:
       block: (tempodb block config, optional)
       flush_check_period: (duration, optional)
       trace_idle_period: (duration, optional)
       max_block_duration: (duration, optional)
       max_block_bytes: (optional)
       complete_block_timeout: (optional)

  traces_storage:
    path: /tmp/tempo/generator/traces (required) # the only required field
    version: (optional)

overrides:
  metrics_generator_processors: [local-blocks]  # Enable the processor

Notes:

This forks a lot of code from the ingester for live traces, flushing, deleting, etc. I originally hoped to call the ingester module but it's not prepared for that and would need a combination of updates (exporting symbols, making parts like flushing to object storage disable-able). So forking was not only more straightforward but also a chance to simplify the code.
Flushing/completing/deleting is done via goroutines per instance. The real ingester has a shared set of workers and work queue across all instances. That approach should be adopted here but unsure yet how to make it work with the processors, since they can be dynamically configured, enabled, and disabled.
This tries to reuse config structs and defaults where possible, but counter-intuitively it doesn't reuse actual same settings as the rest of Tempo. There are a few reasons: they are separate modules and probably should have separate config. This processor's storage has to be distinct from the main storage area since the data is a little different. Real storage will be RF3 (if configured), but the metrics-generator traffic is best-effort and only RF1.

Additional changes:

I think this should fix the "duplication registration" bug that sometimes happens in metrics generator.
Relocates some config defaults to be callable by the generator module.

Which issue(s) this PR fixes:
Fixes n/a

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…ance instead of inside the processor

knylander-grafana · 2023-04-27T01:12:09Z

Will we need to update the metrics-generator docs for this?

… goroutines to exit

zalegrala

Approved with a couple comments. Nice work.

modules/generator/processor/localblocks/config.go

zalegrala · 2023-05-04T15:16:15Z

modules/generator/processor/localblocks/processor.go

+ walBlocks: map[uuid.UUID]common.WALBlock{},
+ completeBlocks: map[uuid.UUID]common.BackendBlock{},
+ liveTraces: newLiveTraces(),
+ closeCh: make(chan struct{}),


Just thinking outloud, not blocking; do we prefer a close chan over a context cancellation?

I like the chose channel for simplicity, but you make a good point that they have different behaviors. Right now the processor would finish any in-progress flush or block conversion before shutting down, whereas context cancellation would immediately break out because it would be passed along to the i/o operations.

mdisibio · 2023-05-08T12:19:09Z

Will we need to update the metrics-generator docs for this?

@knylander-grafana Yes we will need new docs for this processor and the API coming in later PRs.

mdisibio added 3 commits April 21, 2023 15:51

First draft to flush traces to wal

4971525

Add local block completion and retention

8ff9376

Move traces wal to central configuration, create sharded wal per inst…

f7e05f8

…ance instead of inside the processor

mdisibio mentioned this pull request May 1, 2023

Arbitrary span duration metrics from any TraceQL-compatible block #2418

Merged

3 tasks

mdisibio added 2 commits May 2, 2023 07:50

lint

11978da

filter only kind=server, fully flush to disk on shutdown and wait for…

9e1456a

… goroutines to exit

mdisibio marked this pull request as ready for review May 3, 2023 13:21

mdisibio requested review from joe-elliott, annanay25, mapno, kvrhdn, zalegrala and electron0zero as code owners May 3, 2023 13:21

zalegrala approved these changes May 4, 2023

View reviewed changes

rename field

5504958

mdisibio requested a review from ie-pham as a code owner May 5, 2023 12:17

mdisibio merged commit 05ea80d into grafana:main May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New metrics generator processor to store a local copy of traces #2368

New metrics generator processor to store a local copy of traces #2368

mdisibio commented Apr 25, 2023

knylander-grafana commented Apr 27, 2023

zalegrala left a comment

zalegrala May 4, 2023

mdisibio May 4, 2023

mdisibio commented May 8, 2023

New metrics generator processor to store a local copy of traces #2368

New metrics generator processor to store a local copy of traces #2368

Conversation

mdisibio commented Apr 25, 2023

knylander-grafana commented Apr 27, 2023

zalegrala left a comment

Choose a reason for hiding this comment

zalegrala May 4, 2023

Choose a reason for hiding this comment

mdisibio May 4, 2023

Choose a reason for hiding this comment

mdisibio commented May 8, 2023