Skip to content

Commit

Permalink
Add throughput and SLO metrics in the tags and tag values endpoints (#…
Browse files Browse the repository at this point in the history
…4148)

propagate the bytes read from the storage layer to the frontend, and use bytes read to compute throughout, and use that in SLO computation for metadata endpoints.

metadata SLO thresholds can be configured via `metadata_slo` config.
```
search:
   metadata_slo:
      duration_slo: 5s
      throughput_bytes_slo: 1.073741824e+09
```

we will also return the `metrics` in the response of all the metadata endpoints:
 - `/search/tags`
-  `/v2/search/tags`
- `/search/tag/<tagName>/values`
- `/v2/search/tag/<tagName>/values`

here is what the it looks like in the response:
```json
{
   "<existing keys>": "<existing response>",
  "metrics": {
    "inspectedBytes": "630188"
  }
}
```
we return `"metrics": {}` when response is empty or only contains `intrinsics`


it will also expose these new label to existing metrics with `op="metadata"` label
- total metadata queries counter
- metadata queries within SLO counter
- metadata queries throughput histogram
  • Loading branch information
electron0zero authored Oct 15, 2024
1 parent e37f481 commit 327c964
Show file tree
Hide file tree
Showing 54 changed files with 2,171 additions and 701 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@
/tempodb/encoding/benchmark_block
private-key.key
integration/e2e/e2e_integration_test[0-9]*
integration/e2e/deployments/e2e_integration_test[0-9]*
.tempo.yaml
/tmp
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* [ENHANCEMENT] Changed log level from INFO to DEBUG for the TempoDB Find operation using traceId to reduce excessive/unwanted logs in log search. [#4179](https:/grafana/tempo/pull/4179) (@Aki0x137)
* [ENHANCEMENT] tempo-query: separate tls settings for server and client [#4177](https:/grafana/tempo/pull/4177) (@frzifus)
* [ENHANCEMENT] Pushdown collection of results from generators in the querier [#4119](https:/grafana/tempo/pull/4119) (@electron0zero)
* [CHANGE] Add throughput and SLO metrics in the tags and tag values endpoints [#4148](https:/grafana/tempo/pull/4148) (@electron0zero)
* [ENHANCEMENT] Send semver version in api/stattus/buildinfo for cloud deployments [#4110](https:/grafana/tempo/pull/4110) [@Aki0x137]
* [ENHANCEMENT] Speedup tempo-query trace search by allowing parallel queries [#4159](https:/grafana/tempo/pull/4159) (@pavolloffay)
* [ENHANCEMENT] Speedup DistinctString and ScopedDistinctString collectors [#4109](https:/grafana/tempo/pull/4109) (@electron0zero)
Expand Down
105 changes: 85 additions & 20 deletions docs/sources/tempo/api_docs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,9 @@ $ curl -G -s http://localhost:3200/api/search/tags?scope=span | jq
"starter",
"version"
]
"metrics": {
"inspectedBytes": "630188"
}
}
```
Expand Down Expand Up @@ -391,11 +394,9 @@ $ curl -G -s http://localhost:3200/api/v2/search/tags | jq
{
"scopes": [
{
"name": "span",
"name": "link",
"tags": [
"article.count",
"http.flavor",
"http.method",
"link-type"
]
},
{
Expand All @@ -405,16 +406,70 @@ $ curl -G -s http://localhost:3200/api/v2/search/tags | jq
"service.name"
]
},
{
"name": "span",
"tags": [
"article.count",
"http.flavor",
"http.method",
"http.request.header.accept",
"http.request_content_length",
"http.response.header.content-type",
"http.response_content_length",
"http.scheme",
"http.status_code",
"http.target",
"http.url",
"net.host.name",
"net.host.port",
"net.peer.name",
"net.peer.port",
"net.sock.family",
"net.sock.host.addr",
"net.sock.peer.addr",
"net.transport",
"numbers",
"one"
]
},
{
"name": "intrinsic",
"tags": [
"duration",
"event:name",
"event:timeSinceStart",
"instrumentation:name",
"instrumentation:version",
"kind",
"name",
"status"
"rootName",
"rootServiceName",
"span:duration",
"span:kind",
"span:name",
"span:status",
"span:statusMessage",
"status",
"statusMessage",
"trace:duration",
"trace:rootName",
"trace:rootService",
"traceDuration"
]
},
{
"name": "event",
"tags": [
"exception.escape",
"exception.message",
"exception.stacktrace",
"exception.type",
]
}
]
],
"metrics": {
"inspectedBytes": "377046"
}
}
```
Expand All @@ -440,13 +495,16 @@ This query returns all discovered values for the tag `service.name`.
$ curl -G -s http://localhost:3200/api/search/tag/service.name/values | jq
{
"tagValues": [
"adservice",
"cartservice",
"checkoutservice",
"frontend",
"productcatalogservice",
"recommendationservice"
]
"article-service",
"auth-service",
"billing-service",
"cart-service",
"postgres",
"shop-backend"
],
"metrics": {
"inspectedBytes": "431380"
}
}
```
Expand All @@ -468,30 +526,37 @@ See [TraceQL]({{< relref "../traceql" >}}) documentation for more information.
This example queries Tempo using curl and returns all discovered values for the tag `service.name`.
```bash
$ curl http://localhost:3200/api/v2/search/tag/.service.name/values | jq .
$ curl -G -s http://localhost:3200/api/v2/search/tag/.service.name/values | jq
{
"tagValues": [
{
"type": "string",
"value": "customer"
"value": "article-service"
},
{
"type": "string",
"value": "postgres"
},
{
"type": "string",
"value": "mysql"
"value": "cart-service"
},
{
"type": "string",
"value": "driver"
"value": "billing-service"
},
{
"type": "string",
"value": "frontend"
"value": "shop-backend"
},
{
"type": "string",
"value": "redis"
"value": "auth-service"
}
]
],
"metrics": {
"inspectedBytes": "502756"
}
}
```
This endpoint can also receive `start` and `end` optional parameters. These parameters define the time range from which the tags are fetched
Expand Down
13 changes: 12 additions & 1 deletion docs/sources/tempo/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -614,7 +614,7 @@ query_frontend:

# If set to a non-zero value, it's value will be used to decide if query is within SLO or not.
# Query is within SLO if it returned 200 within duration_slo seconds OR processed throughput_slo bytes/s data.
# NOTE: `duration_slo` and `throughput_bytes_slo` both must be configured for it to work
# NOTE: Requires `duration_slo` AND `throughput_bytes_slo` to be configured.
[duration_slo: <duration> | default = 0s ]

# If set to a non-zero value, it's value will be used to decide if query is within SLO or not.
Expand All @@ -623,6 +623,17 @@ query_frontend:

# The number of shards to break ingester queries into.
[ingester_shards]: <int> | default = 1]

# SLO configuration for Metadata (tags and tag values) endpoints.
metadata_slo:
# If set to a non-zero value, it's value will be used to decide if metadata query is within SLO or not.
# Query is within SLO if it returned 200 within duration_slo seconds OR processed throughput_slo bytes/s data.
# NOTE: Requires `duration_slo` AND `throughput_bytes_slo` to be configured.
[duration_slo: <duration> | default = 0s ]

# If set to a non-zero value, it's value will be used to decide if metadata query is within SLO or not.
# Query is within SLO if it returned 200 within duration_slo seconds OR processed throughput_slo bytes/s data.
[throughput_bytes_slo: <float> | default = 0 ]

# Trace by ID lookup configuration
trace_by_id:
Expand Down
18 changes: 16 additions & 2 deletions docs/sources/tempo/configuration/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ go run ./cmd/tempo --storage.trace.backend=local --storage.trace.local.path=/var
## Complete configuration

{{< admonition type="note" >}}
This manifest was generated on 2023-11-13.
This manifest was generated on 2024-10-11.
{{% /admonition %}}

```yaml
Expand All @@ -33,6 +33,7 @@ server:
grpc_listen_address: ""
grpc_listen_port: 9095
grpc_listen_conn_limit: 0
proxy_protocol_enabled: false
tls_cipher_suites: ""
tls_min_version: ""
http_tls_config:
Expand Down Expand Up @@ -70,6 +71,8 @@ server:
grpc_server_min_time_between_pings: 10s
grpc_server_ping_without_stream_allowed: true
grpc_server_num_workers: 0
grpc_server_stats_tracking_enabled: true
grpc_server_recv_buffer_pools_enabled: false
log_format: logfmt
log_level: info
log_source_ips_enabled: false
Expand All @@ -89,6 +92,7 @@ internal_server:
grpc_listen_address: ""
grpc_listen_port: 0
grpc_listen_conn_limit: 0
proxy_protocol_enabled: false
tls_cipher_suites: ""
tls_min_version: ""
http_tls_config:
Expand Down Expand Up @@ -126,6 +130,8 @@ internal_server:
grpc_server_min_time_between_pings: 0s
grpc_server_ping_without_stream_allowed: false
grpc_server_num_workers: 0
grpc_server_stats_tracking_enabled: false
grpc_server_recv_buffer_pools_enabled: false
log_format: logfmt
log_level: info
log_source_ips_enabled: false
Expand Down Expand Up @@ -314,7 +320,9 @@ query_frontend:
max_duration: 3h0m0s
query_backend_after: 30m0s
interval: 5m0s
max_exemplars: 100
multi_tenant_queries_enabled: true
response_consumers: 10
compactor:
ring:
kvstore:
Expand Down Expand Up @@ -582,7 +590,7 @@ metrics_generator:
path: ""
v2_encoding: none
search_encoding: none
ingestion_time_range_slack: 0s
ingestion_time_range_slack: 2m0s
version: vParquet4
metrics_ingestion_time_range_slack: 30s
query_timeout: 30s
Expand Down Expand Up @@ -620,11 +628,13 @@ storage:
offset_index: false
blocklist_poll: 5m0s
blocklist_poll_concurrency: 50
blocklist_poll_tenant_concurrency: 0
blocklist_poll_fallback: true
blocklist_poll_tenant_index_builders: 2
blocklist_poll_stale_tenant_index: 0s
blocklist_poll_jitter_ms: 0
blocklist_poll_tolerate_consecutive_errors: 1
blocklist_poll_tolerate_tenant_failures: 1
empty_tenant_deletion_enabled: false
empty_tenant_deletion_age: 0s
backend: local
Expand Down Expand Up @@ -699,6 +709,9 @@ overrides:
max_traces_per_user: 10000
read:
max_bytes_per_tag_values_query: 5000000
metrics_generator:
generate_native_histograms: classic
ingestion_time_range_slack: 0s
global:
max_bytes_per_trace: 5000000
per_tenant_override_config: ""
Expand Down Expand Up @@ -788,6 +801,7 @@ memberlist:
rejoin_interval: 0s
left_ingesters_timeout: 5m0s
leave_timeout: 20s
broadcast_timeout_for_local_updates_on_shutdown: 10s
message_history_buffer_bytes: 0
bind_addr: []
bind_port: 7946
Expand Down
3 changes: 3 additions & 0 deletions example/docker-compose/local/tempo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
metadata_slo:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s

Expand Down
3 changes: 3 additions & 0 deletions example/docker-compose/multi-tenant/tempo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
metadata_slo:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s

Expand Down
3 changes: 3 additions & 0 deletions example/docker-compose/shared/tempo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
metadata_slo:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s

Expand Down
Loading

0 comments on commit 327c964

Please sign in to comment.