ingest: translate known OpenTelemetry JVM metrics #4986

axw · 2021-03-24T07:16:06Z

Motivation/summary

Copy well-known JVM metrics from their OpenTelemetry
names and associated labels to their counterparts
produced by the Elastic APM Java agent. This enables
users to visualise the metrics in the Elastic APM
app in Kibana.

Metrics are duplicated to avoid surprising users by
silently dropping the OpenTelemetry names. Users can
drop either one of the duplicates with ingest pipeline
customisation.

Checklist

Update CHANGELOG.asciidoc
Documentation has been updated

How to test these changes

Run apm-server
Instrument a Java application with https:/open-telemetry/opentelemetry-java-instrumentation, pointed at apm-server
Wait for metrics to be collected and sent to APM Server (not sure how long this is)
Navigate to the APM app in Kibana, and check the GC and memory usage graphs are populated

Related issues

Closes #4919

Copy well-known JVM metrics from their OpenTelemetry names and associated labels to their counterparts produced by the Elastic APM Java agent. This enables users to visualise the metrics in the Elastic APM app in Kibana. Metrics are duplicated to avoid surprising users by silently dropping the OpenTelemetry names. Users can drop either one of the duplicates with ingest pipeline customisation.

apmmachine · 2021-03-24T07:20:27Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #4986 updated
Start Time: 2021-04-01T00:27:45.130+0000
Duration: 60 min 3 sec
Commit: 5ddf0a5

Test stats 🧪

Test	Results
Failed	0
Passed	6278
Skipped	120
Total	6398

Trends 🧪

axw · 2021-03-24T08:54:15Z

@bmorelli25 I haven't added any docs here, not sure where would be appropriate?

codecov-io · 2021-03-24T09:13:33Z

Codecov Report

Merging #4986 (fdce970) into master (efb4c16) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4986      +/-   ##
==========================================
+ Coverage   77.26%   77.28%   +0.01%     
==========================================
  Files         179      179              
  Lines       10486    10486              
==========================================
+ Hits         8102     8104       +2     
+ Misses       2384     2382       -2

Impacted Files	Coverage Δ
beater/jaeger/common.go	`91.66% <0.00%> (+8.33%)`	⬆️

felixbarny

Why does the ingest pipeline get applied to all data streams? I'd expect it to only be in /data_stream/app_metrics/elasticsearch/ingest_pipeline/apm_opentelemetry_metrics.json

bmorelli25 · 2021-03-24T17:58:37Z

@bmorelli25 I haven't added any docs here, not sure where would be appropriate?

As this is a new default ingest pipeline, I'd like to see docs added here: https:/elastic/apm-server/blob/master/docs/configuring-ingest.asciidoc. Could you add a new entry to the table that includes an added badge: added::[7.13].

We'll likely want to at least note this in our OTel docs, so I've added a note to elastic/observability-docs#467.

axw · 2021-03-25T00:28:39Z

Could you add a new entry to the table that includes an added badge: added::[7.13].

Thanks @bmorelli25, will do.

Why does the ingest pipeline get applied to all data streams? I'd expect it to only be in /data_stream/app_metrics/elasticsearch/ingest_pipeline/apm_opentelemetry_metrics.json

@felixbarny that would be ideal. With the old way of doing things, we only ever have a single pipeline. While we maintain both the old and new ways in tandem, we're just copying the single pipeline to all data streams in the integration package. In the future it would make sense to be more targeted.

axw · 2021-03-25T00:46:03Z

@bmorelli25 tagging you for review on the docs change. I also added an entry for another recently introduced ingest pipeline. Not sure how much of a pain it would be, but perhaps we should document the specific mapping when we update the OTel docs. Then in the ingest pipeline docs page we can link there.

bmorelli25

Not sure how much of a pain it would be, but perhaps we should document the specific mapping when we update the OTel docs. Then in the ingest pipeline docs page we can link there.

👍 Updated the doc issue accordingly.

docs/configuring-ingest.asciidoc

bmorelli25

Docs look good! Thanks.

felixbarny · 2021-03-25T07:10:30Z

we're just copying the single pipeline to all data streams in the integration package.

Is that a limitation in APM Server or in data streams?

In the future it would make sense to be more targeted.

Is there an issue to track that?

axw · 2021-03-25T09:16:03Z

we're just copying the single pipeline to all data streams in the integration package.

Is that a limitation in APM Server or in data streams?

APM Server. We're doing this for our own sanity, having a single source of truth for the pipeline.

In the future it would make sense to be more targeted.

Is there an issue to track that?

There wasn't, but I've just now added an item to #2631.

mergify · 2021-03-26T13:38:23Z

This pull request is now in conflicts. Could you fix it @axw? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b otel-jvm-metrics upstream/otel-jvm-metrics
git merge upstream/master
git push upstream otel-jvm-metrics

axw · 2021-03-29T02:37:03Z

jenkins run the tests please

...m/0.1.0/data_stream/app_metrics/elasticsearch/ingest_pipeline/apm_opentelemetry_metrics.json

ingest/pipeline/definition.yml

- Move `runtime.jvm.gc.collection` before `runtime.jvm.gc.time` - Require `ctx.runtime?.jvm?.gc != null` when copying `labels.gc`

mergify · 2021-03-31T06:00:23Z

This pull request is now in conflicts. Could you fix it @axw? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b otel-jvm-metrics upstream/otel-jvm-metrics
git merge upstream/master
git push upstream otel-jvm-metrics

axw · 2021-03-31T10:57:49Z

jenkins run the tests please

mergify · 2021-03-31T12:02:02Z

This pull request is now in conflicts. Could you fix it @axw? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b otel-jvm-metrics upstream/otel-jvm-metrics
git merge upstream/master
git push upstream otel-jvm-metrics

* ingest: translate known OpenTelemetry JVM metrics Copy well-known JVM metrics from their OpenTelemetry names and associated labels to their counterparts produced by the Elastic APM Java agent. This enables users to visualise the metrics in the Elastic APM app in Kibana. Metrics are duplicated to avoid surprising users by silently dropping the OpenTelemetry names. Users can drop either one of the duplicates with ingest pipeline customisation. * Add changelog * docs: document new ingest pipelines * docs: move existing docs to correct position * ingest: address review comments - Move `runtime.jvm.gc.collection` before `runtime.jvm.gc.time` - Require `ctx.runtime?.jvm?.gc != null` when copying `labels.gc` * Regenerate integration package (cherry picked from commit 264d151) # Conflicts: # changelogs/head.asciidoc

* ingest: translate known OpenTelemetry JVM metrics (#4986) * ingest: translate known OpenTelemetry JVM metrics Copy well-known JVM metrics from their OpenTelemetry names and associated labels to their counterparts produced by the Elastic APM Java agent. This enables users to visualise the metrics in the Elastic APM app in Kibana. Metrics are duplicated to avoid surprising users by silently dropping the OpenTelemetry names. Users can drop either one of the duplicates with ingest pipeline customisation. * Add changelog * docs: document new ingest pipelines * docs: move existing docs to correct position * ingest: address review comments - Move `runtime.jvm.gc.collection` before `runtime.jvm.gc.time` - Require `ctx.runtime?.jvm?.gc != null` when copying `labels.gc` * Regenerate integration package (cherry picked from commit 264d151) # Conflicts: # changelogs/head.asciidoc * Delete head.asciidoc Co-authored-by: Andrew Wilkins <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* ingest: translate known OpenTelemetry JVM metrics Copy well-known JVM metrics from their OpenTelemetry names and associated labels to their counterparts produced by the Elastic APM Java agent. This enables users to visualise the metrics in the Elastic APM app in Kibana. Metrics are duplicated to avoid surprising users by silently dropping the OpenTelemetry names. Users can drop either one of the duplicates with ingest pipeline customisation. * Add changelog * docs: document new ingest pipelines * docs: move existing docs to correct position * ingest: address review comments - Move `runtime.jvm.gc.collection` before `runtime.jvm.gc.time` - Require `ctx.runtime?.jvm?.gc != null` when copying `labels.gc` * Regenerate integration package (cherry picked from commit 264d151) # Conflicts: # changelogs/7.13.asciidoc # docs/configuring-ingest.asciidoc # systemtest/otlp_test.go

simitt · 2021-05-12T13:02:45Z

Tested with BC4:
Instrumented a java test app with the otel instrumentation and waited until metrics documents existed. No metrics show up in the APM UI. @axw is there anything else I need to set/enable?

Update:

agent.name needs to be java (not opentelemetry/java) so that the JVM tab shows up
minimum jvm.memory.heap.used needs to exist for JVM details to be shown.

axw · 2021-05-13T02:08:31Z

agent.name needs to be java (not opentelemetry/java) so that the JVM tab shows up

Ugh :(
I don't think we should be changing the agent name, so this part will probably need to happen in the UI.

minimum jvm.memory.heap.used needs to exist for JVM details to be shown.

Hmm. Is there just one metric document? We should be capturing jvm.memory.heap.used. Maybe what's happening is that we're splitting the metrics into multiple docs - would that explain this?

felixbarny · 2021-05-13T06:57:17Z

What might also be missing is agent.ephemeral_id. IIRC, the UI needs that for downsampling/calculating derivatives for the GC time and count.

simitt · 2021-05-18T10:24:18Z

created elastic/kibana#100253 as a follow up for showing opentelemetry JVM metrics.

Hmm. Is there just one metric document? We should be capturing jvm.memory.heap.used. Maybe what's happening is that we're splitting the metrics into multiple docs - would that explain this?

Yes, there is a single document per jvm.memory.heap metric created.

simitt · 2021-05-18T13:43:06Z

created a follow up for the metrics docs #5275

axw · 2021-05-24T07:32:30Z

What might also be missing is agent.ephemeral_id. IIRC, the UI needs that for downsampling/calculating derivatives for the GC time and count.

@felixbarny there's a TypeScript definition for ephemeral_id, but no other code in the UI that actually references the field in aggs or otherwise. Does that indicate a UI bug?

felixbarny · 2021-05-26T09:27:18Z

Does that indicate a UI bug?

Looking at the aggs that Kibana does, I think it doesn't work as I'd expect

https:/elastic/kibana/blob/1f02c48d3cf57dd921cfe64186a695f8375e3954/x-pack/plugins/apm/server/lib/metrics/by_agent/java/gc/fetch_and_transform_gc_metrics.ts#L70-L109

It does a derivative over the max value of gc.count across instances as opposed to summing the derivatives of each instance.

I guess it would be easier for the UI if the Java agent reported deltas for gc.time instead of an incrementing counter. However, it might be very tricky to adjust the OTel metrics to report deltas.
IIRC, there's some setting in OTel that allows to report deltas instead of monotonically incrementing counters. But I'm not sure if that setting also applies for the GC metrics that are captured via a longSumObserver.

simitt · 2021-05-26T19:58:24Z

@felixbarny do you mind moving this over to a new issue? It sounds like it'd be worth a follow up discussion.

felixbarny · 2021-05-27T11:05:27Z

After giving it another thought, it's probably not an issue as that chart is only ever shown in the context of a specific instance. Therefore, the ephemeral_id is not needed.

axw force-pushed the otel-jvm-metrics branch from 0172471 to 7190052 Compare March 24, 2021 07:17

axw marked this pull request as ready for review March 24, 2021 08:52

axw requested review from felixbarny and a team March 24, 2021 08:52

Add changelog

9cb62db

felixbarny reviewed Mar 24, 2021

View reviewed changes

bmorelli25 mentioned this pull request Mar 24, 2021

Native OpenTelemetry intake documentation updates elastic/observability-docs#467

Closed

5 tasks

docs: document new ingest pipelines

0d70e41

axw requested a review from bmorelli25 March 25, 2021 00:44

Merge branch 'master' into otel-jvm-metrics

cd9190c

axw added the v7.13.0 label Mar 25, 2021

bmorelli25 reviewed Mar 25, 2021

View reviewed changes

docs/configuring-ingest.asciidoc Show resolved Hide resolved

docs: move existing docs to correct position

8e06580

bmorelli25 reviewed Mar 25, 2021

View reviewed changes

felixbarny approved these changes Mar 25, 2021

View reviewed changes

Merge branch 'master' into otel-jvm-metrics

2fbd3fb

simitt reviewed Mar 30, 2021

View reviewed changes

...m/0.1.0/data_stream/app_metrics/elasticsearch/ingest_pipeline/apm_opentelemetry_metrics.json Outdated Show resolved Hide resolved

ingest/pipeline/definition.yml Outdated Show resolved Hide resolved

ingest/pipeline/definition.yml Outdated Show resolved Hide resolved

axw added 2 commits March 30, 2021 14:19

ingest: address review comments

f625f2b

- Move `runtime.jvm.gc.collection` before `runtime.jvm.gc.time` - Require `ctx.runtime?.jvm?.gc != null` when copying `labels.gc`

Regenerate integration package

d78cc61

axw requested a review from simitt March 31, 2021 03:48

simitt approved these changes Mar 31, 2021

View reviewed changes

Merge branch 'master' into otel-jvm-metrics

9f2f373

axw added 2 commits March 31, 2021 20:02

Merge branch 'master' into otel-jvm-metrics

fdce970

Merge branch 'master' into otel-jvm-metrics

5ddf0a5

axw merged commit 264d151 into elastic:master Apr 1, 2021

axw deleted the otel-jvm-metrics branch April 1, 2021 01:32

mergify bot mentioned this pull request Apr 1, 2021

ingest: translate known OpenTelemetry JVM metrics (bp #4986) #5047

Merged

jalvz added the test-plan label Apr 27, 2021

mergify bot mentioned this pull request Apr 27, 2021

[7.13] ingest: translate known OpenTelemetry JVM metrics (backport #4986) #5165

Closed

simitt self-assigned this May 12, 2021

simitt added the test-plan-regression label May 12, 2021

axw mentioned this pull request May 18, 2021

Visualize JVM health indicators collected by the OpenTelemetry Java Agent #4919

Closed

simitt mentioned this pull request May 18, 2021

[APM] Show JVM tab for otel java agent elastic/kibana#100253

Closed

simitt mentioned this pull request May 18, 2021

Create one metrics document with jvm metrics per opentelemetry event #5275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingest: translate known OpenTelemetry JVM metrics #4986

ingest: translate known OpenTelemetry JVM metrics #4986

axw commented Mar 24, 2021 •

edited

Loading

apmmachine commented Mar 24, 2021 •

edited

Loading

Build stats

Test stats 🧪

Trends 🧪

axw commented Mar 24, 2021

codecov-io commented Mar 24, 2021 •

edited

Loading

felixbarny left a comment

bmorelli25 commented Mar 24, 2021 •

edited

Loading

axw commented Mar 25, 2021

axw commented Mar 25, 2021

bmorelli25 left a comment

bmorelli25 left a comment

felixbarny commented Mar 25, 2021

axw commented Mar 25, 2021

mergify bot commented Mar 26, 2021

axw commented Mar 29, 2021

mergify bot commented Mar 31, 2021

axw commented Mar 31, 2021

mergify bot commented Mar 31, 2021

simitt commented May 12, 2021 •

edited

Loading

axw commented May 13, 2021

felixbarny commented May 13, 2021

simitt commented May 18, 2021 •

edited

Loading

simitt commented May 18, 2021

axw commented May 24, 2021

felixbarny commented May 26, 2021

simitt commented May 26, 2021

felixbarny commented May 27, 2021

ingest: translate known OpenTelemetry JVM metrics #4986

ingest: translate known OpenTelemetry JVM metrics #4986

Conversation

axw commented Mar 24, 2021 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

apmmachine commented Mar 24, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

axw commented Mar 24, 2021

codecov-io commented Mar 24, 2021 • edited Loading

Codecov Report

felixbarny left a comment

Choose a reason for hiding this comment

bmorelli25 commented Mar 24, 2021 • edited Loading

axw commented Mar 25, 2021

axw commented Mar 25, 2021

bmorelli25 left a comment

Choose a reason for hiding this comment

bmorelli25 left a comment

Choose a reason for hiding this comment

felixbarny commented Mar 25, 2021

axw commented Mar 25, 2021

mergify bot commented Mar 26, 2021

axw commented Mar 29, 2021

mergify bot commented Mar 31, 2021

axw commented Mar 31, 2021

mergify bot commented Mar 31, 2021

simitt commented May 12, 2021 • edited Loading

axw commented May 13, 2021

felixbarny commented May 13, 2021

simitt commented May 18, 2021 • edited Loading

simitt commented May 18, 2021

axw commented May 24, 2021

felixbarny commented May 26, 2021

simitt commented May 26, 2021

felixbarny commented May 27, 2021

axw commented Mar 24, 2021 •

edited

Loading

apmmachine commented Mar 24, 2021 •

edited

Loading

codecov-io commented Mar 24, 2021 •

edited

Loading

bmorelli25 commented Mar 24, 2021 •

edited

Loading

simitt commented May 12, 2021 •

edited

Loading

simitt commented May 18, 2021 •

edited

Loading