Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture all request and response headers #1061

Closed
pavolloffay opened this issue Oct 6, 2020 · 13 comments · Fixed by #1898
Closed

Capture all request and response headers #1061

pavolloffay opened this issue Oct 6, 2020 · 13 comments · Fixed by #1898
Labels
area:semantic-conventions Related to semantic conventions release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@pavolloffay
Copy link
Member

pavolloffay commented Oct 6, 2020

What are you trying to achieve?

I want to capture selected/all HTTP request and response headers.

The instrumentation libraries should be configurable to capture selected or all HTTP request/response headers.

To move this forward we probably need to define:

  • data semantics e.g. http.request.header.<header-name>/http.response.header.<header-name>
  • configuration for instrumentations e.g. OTEL_INSTRUMENTATION_CAPTURE_REQUEST_HEADERS=<name1,name2> OTEL_INSTRUMENTATION_CAPTURE_REQUEST_HEADERS_ALL=bool

Additional context.

Related issues:

@pavolloffay pavolloffay added the spec:trace Related to the specification/trace directory label Oct 6, 2020
@Oberon00 Oberon00 added area:semantic-conventions Related to semantic conventions release:after-ga Not required before GA release, and not going to work on before GA labels Oct 6, 2020
@Oberon00
Copy link
Member

Oberon00 commented Oct 6, 2020

Actually what you suggest sounds for semantic conventions sounds great already. Just have to specify details, e.g. "header-name" should be used as-is with any dashes and casing preserved (e.g. http.response.header.Content-Type) and values should be string arrays (because headers with the same name can appear multiple times).

For the instrumentation conventions, that sounds also reasonable generally, but I don't think we have anything in the spec already that only affects instrumentations and not even the SDK, so I'm not sure where we would put them. It may be best to split these into a separate issue.

@Oberon00
Copy link
Member

I created open-telemetry/build-tools#29 to add support in the semantic convention generator for this, but we can go ahead and write the markdown by hand for now.

@mateuszrzeszutek
Copy link
Member

Hey,
We (Splunk) are very interested in introducing this to HTTP semantic conventions. Before I file a PR I wanted to ask one thing: why "header-name" should be used as-is with [...] casing preserved? HTTP headers have case-insensitive names, wouldn't it make it a bit easier on the user to have them all lowercased?

@Oberon00
Copy link
Member

I wanted to ask one thing: why "header-name" should be used as-is with [...] casing preserved? HTTP headers have case-insensitive names, wouldn't it make it a bit easier on the user to have them all lowercased?

I simply suggested what is most information-preserving. If you lowercase everything, it may be simpler but it loses information.
I would rather do such normalization (lowercasing) on the backend/collector if desired.

While HTTP headers ought to be case-insensitive, there might be issues with hashing/signing or bad server implementations that only work if the headers use canonical casing. Since one use case of telemetry data is to debug issues like "why did this HTTP request fail but that other one didn't", such subtle differences in input may be relevant. Differences in casing may also hint at different client implementations, even if these clients send no/same User-Agent headers.

@mateuszrzeszutek
Copy link
Member

Got it -- that makes sense, thanks for the explanation!

@pavolloffay
Copy link
Member Author

Lowercasing works as well, we have actually added lowercasing to all our agents.

I agree that agents can capture data in the original form and lowercasing can be done in the ingestion pipeline (e.g. collector). I am not sure if OTEL collector can easily change attribute keys - with no perf impact.

@yurishkuro
Copy link
Member

and values should be string arrays

I'd say they should be string arrays if the instrumentation has access to them in such form. If it only has access to comma-separated string, perhaps just output as a single string?

@yurishkuro
Copy link
Member

http.request.header.<header-name>

another option would be to use nested structures

@mateuszrzeszutek
Copy link
Member

another option would be to use nested structures

What do you mean by "nested structures"?

I'd say they should be string arrays if the instrumentation has access to them in such form. If it only has access to comma-separated string, perhaps just output as a single string?

The type would still be string[], but if instrumentation does not have access to the list/array of header values it just sets a single-item array -- did I understood that correctly?

@Oberon00
Copy link
Member

Oberon00 commented Aug 30, 2021

I'd say they should be string arrays if the instrumentation has access to them in such form. If it only has access to comma-separated string, perhaps just output as a single string?

Theoretically allowing both strings and string arrays shouldn't be a problem but in practice it is, because some SIGs, e.g. Java, have types encoded in their generated semantic conventions, and "union types" pose a problem. That's also the reason we have both process.command_line (string) and process.command_args (string array), see #831.

Putting a comma-separated string as a single array item should be allowed though (or mixing split and concatenated headers in the same array).

@Oberon00
Copy link
Member

another option would be to use nested structures

These are only allowed for log attributes, not tracing or metrics.

@yurishkuro
Copy link
Member

yurishkuro commented Aug 30, 2021

another option would be to use nested structures

These are only allowed for log attributes, not tracing or metrics.

I understand why for metrics, but for traces this does not sound like a reasonable or necessary restriction.

@Oberon00
Copy link
Member

Since the choice between nesting vs encoding in the name is more of a detail for the issue at hand, let's keep further discussion on allowing nested structures to #376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
4 participants