Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Be able to correlate traces even when Cloudflare times out #13685

Closed
MadLittleMods opened this issue Sep 1, 2022 · 1 comment · Fixed by #13801
Closed

Be able to correlate traces even when Cloudflare times out #13685

MadLittleMods opened this issue Sep 1, 2022 · 1 comment · Fixed by #13801
Assignees
Labels
A-Metrics metrics, measures, stuff we put in Prometheus O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@MadLittleMods
Copy link
Contributor

When Synapse responds successfully, we add a synapse-trace-id header which makes it nice and easy to jump to the trace in Jaeger.

But when a Cloudflare timeout gets in the way, we just see a 524: A timeout occurred. The Cloudflare response does have a Cloudflare specific cf-ray header which we could include in the tags of top-level servlet tracing span. This is even what Cloudflare essentially suggests:

Add the CF-Ray header to your origin web server logs to match requests proxied to Cloudflare to requests in your server logs.

This is a bit specific to matrix.org which uses Cloudflare so perhaps we want to add some Synapse config to specify which request header field to pull from and add as a tag to the servlet tracing span.

@squahtx squahtx added S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. A-Metrics metrics, measures, stuff we put in Prometheus O-Uncommon Most users are unlikely to come across this or unexpected workflow labels Sep 1, 2022
@MadLittleMods
Copy link
Contributor Author

Discussed in the backend team sync this morning and the leaning was to add some config and an aversion to add something Cloudflare specific in the codebase itself.

@MadLittleMods MadLittleMods self-assigned this Sep 12, 2022
MadLittleMods added a commit that referenced this issue Sep 12, 2022
MadLittleMods added a commit that referenced this issue Sep 13, 2022
MadLittleMods added a commit that referenced this issue Sep 15, 2022
…pse (pull request ID from header) (#13801)

Fix #13685

New config:

```diff
  listeners:
    - port: 8008
      tls: false
      type: http
      x_forwarded: true
+     request_id_header: "cf-ray"
      bind_addresses: ['::1', '127.0.0.1', '0.0.0.0']
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Metrics metrics, measures, stuff we put in Prometheus O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
2 participants