Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: drop use of TagCache, extract tags and tag values on-demand #1068

Merged
merged 30 commits into from
Dec 2, 2021

Conversation

kvrhdn
Copy link
Member

@kvrhdn kvrhdn commented Oct 21, 2021

What this PR does:
Instead of storing tags and tag values in the TagCache we can extract them directly from live traces, WAL and local blocks when requested. This should lower the amount of work spent on updating the tagcache during ingest.

Since tag values are extracted on-demand, we are not limited anymore by the size of the tag cache. So this makes it possible to return thousands of values when needed.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@kvrhdn
Copy link
Member Author

kvrhdn commented Oct 21, 2021

I've been doing a test on my local set up fetching tag values for load_generator.seq_num, this is a unique ID added to every trace by the synthetic-load-generator. Before these changes the amount of values always got truncated at about ~200 values. With this PR amount of values returned increases up to 80k and drops back to 20k when a block was flushed.

Avg latency sits around 200-400 ms.

@kvrhdn kvrhdn marked this pull request as ready for review October 21, 2021 17:17
@kvrhdn
Copy link
Member Author

kvrhdn commented Oct 22, 2021

TODO

  • Check if the context has been cancelled
    Screenshot 2021-10-22 at 13 45 05
  • Experiment with filtering tag values on service.name Next PR
  • Most time is spent searching live traces, can we speed this up?

@kvrhdn kvrhdn requested a review from mdisibio October 26, 2021 23:07
@kvrhdn kvrhdn added this to the v1.3 milestone Oct 29, 2021
@kvrhdn
Copy link
Member Author

kvrhdn commented Nov 8, 2021

Still to address: when fetching tag values with very high cardinality (for instance http.url) requests fail/time out when being sent between components. I.e. the ingester is fine digging up all the values, but the GRPC connection between the ingester, querier and query-frontend was hitting the max bytes size...
We should add some limits (optionally configurable) to protect against this happening. Ideally, we return early when we have 'a lot of' values.

Future improvements (not for this PR): we can add filter capabilities to the tag values endpoint, so if someone starts typing http.url=/myService/ we only return values containing /myService/. This would make high cardinality tags more useable.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the TagCache struct?

@kvrhdn
Copy link
Member Author

kvrhdn commented Nov 8, 2021

Remove the TagCache struct?

Yep 👍, I've removed tempodb/search/tag_cache.go, it's all the way at the bottom of the diff
https:/grafana/tempo/pull/1068/files#diff-b29b479cf004a8bf08fb55114702b94f6d818911fca0c83c985e771db2fbe4bb

Signed-off-by: Annanay <[email protected]>
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the limit check in. Good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants