Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JFR streaming metrics gatherer #7886

Merged
merged 18 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
plugins {
id("otel.javaagent-instrumentation")
}

otelJava {
minJavaVersionSupported.set(JavaVersion.VERSION_17)
}

dependencies {
implementation(project(":instrumentation:runtime-telemetry-jfr:library"))
compileOnly("io.opentelemetry:opentelemetry-sdk-extension-autoconfigure")
}

tasks {
test {
jvmArgs("-Dotel.instrumentation.runtime-telemetry-jfr.enabled=true")
roberttoyonaga marked this conversation as resolved.
Show resolved Hide resolved
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.javaagent.runtimetelemetryjfr;

import com.google.auto.service.AutoService;
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.JfrTelemetry;
import io.opentelemetry.javaagent.extension.AgentListener;
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk;
import io.opentelemetry.sdk.autoconfigure.spi.ConfigProperties;

/** An {@link AgentListener} that enables runtime metrics during agent startup. */
@AutoService(AgentListener.class)
public class RuntimeMetricsInstallerJfr implements AgentListener {

@Override
public void afterAgent(AutoConfiguredOpenTelemetrySdk autoConfiguredSdk) {
ConfigProperties config = autoConfiguredSdk.getConfig();

OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();
JfrTelemetry jfrTelemetry = null;
/*
By default don't use any JFR metrics. May change this once semantic conventions are updated.
If enabled, default to only the metrics not already covered by runtime-telemetry-jmx
*/

if (config.getBoolean("otel.instrumentation.runtime-telemetry-jfr.enable-all", false)) {
jfrTelemetry = JfrTelemetry.builder(openTelemetry).enableAllFeatures().build();
} else if (config.getBoolean("otel.instrumentation.runtime-telemetry-jfr.enabled", false)) {
jfrTelemetry = JfrTelemetry.create(openTelemetry);
}
if (jfrTelemetry != null) {
JfrTelemetry finalJfrTelemetry = jfrTelemetry;
Thread cleanupTelemetry = new Thread(() -> finalJfrTelemetry.close());
Runtime.getRuntime().addShutdownHook(cleanupTelemetry);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.javaagent.runtimetelemetryjfr;

import static io.opentelemetry.sdk.testing.assertj.OpenTelemetryAssertions.assertThat;
import static org.awaitility.Awaitility.await;

import io.opentelemetry.instrumentation.testing.AgentTestRunner;
import io.opentelemetry.sdk.metrics.data.MetricData;
import io.opentelemetry.sdk.testing.assertj.MetricAssert;
import java.util.Collection;
import java.util.function.Consumer;
import org.junit.jupiter.api.Test;

class JfrRuntimeMetricsTest {
@SafeVarargs
private static void waitAndAssertMetrics(Consumer<MetricAssert>... assertions) {
await()
.untilAsserted(
() -> {
Collection<MetricData> metrics = AgentTestRunner.instance().getExportedMetrics();
assertThat(metrics).isNotEmpty();
for (Consumer<MetricAssert> assertion : assertions) {
assertThat(metrics).anySatisfy(metric -> assertion.accept(assertThat(metric)));
}
});
}

@Test
void shouldHaveDefaultMetrics() {
// This should generate some events
System.gc();

waitAndAssertMetrics(
metric -> metric.hasName("process.runtime.jvm.cpu.longlock"),
metric -> metric.hasName("process.runtime.jvm.cpu.limit"),
metric -> metric.hasName("process.runtime.jvm.cpu.context_switch"));
}
}
46 changes: 46 additions & 0 deletions instrumentation/runtime-telemetry-jfr/library/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

The main entry point is the `JfrTelemetry` class in the package `io.opentelemetry.instrumentation.runtimetelemetryjfr`:

```java
// Initialize JfrTelemetry
JfrTelemetry jfrTelemetry = JfrTelemetry.create(openTelemetry);

// Close JfrTelemetry to stop listening for JFR events
jfrTelemetry.close();
```

`JfrTelemetry` works by subscribing to certain JFR events, and using relevant bits of information
from the events to produce telemetry data like metrics. The code is divided into "handlers", which
listen for specific events and produce relevant telemetry. The handlers are organized into
features (i.e `JfrFeature`), which represent a category of telemetry and can be toggled on and
off. `JfrTelemetry` evaluates which features are enabled, and only listens for the events required
by the handlers associated with those features.

Enable or disable a feature as follows:

```
JfrTelemetry jfrTelemetry = JfrTelemetry.builder(openTelemetry)
.enableFeature(JfrFeature.BUFFER_METRICS)
.disableFeature(JfrFeature.LOCK_METRICS)
.build();
```

The following table describes the set of `JfrFeatures` available, whether each is enabled by
default, and the telemetry each produces:

<!-- DO NOT MANUALLY EDIT. Regenerate table following changes to instrumentation using ./gradlew generateDocs -->
<!-- generateDocsStart -->

| JfrFeature | Default Enabled | Metrics |
|---|---|---|
| BUFFER_METRICS | false | `process.runtime.jvm.buffer.count`, `process.runtime.jvm.buffer.limit`, `process.runtime.jvm.buffer.usage` |
| CLASS_LOAD_METRICS | false | `process.runtime.jvm.classes.current_loaded`, `process.runtime.jvm.classes.loaded`, `process.runtime.jvm.classes.unloaded` |
| CONTEXT_SWITCH_METRICS | true | `process.runtime.jvm.cpu.context_switch` |
| CPU_COUNT_METRICS | true | `process.runtime.jvm.cpu.limit` |
| CPU_UTILIZATION_METRICS | false | `process.runtime.jvm.cpu.utilization`, `process.runtime.jvm.system.cpu.utilization` |
| GC_DURATION_METRICS | false | `process.runtime.jvm.gc.duration` |
| LOCK_METRICS | true | `process.runtime.jvm.cpu.longlock` |
| MEMORY_ALLOCATION_METRICS | true | `process.runtime.jvm.memory.allocation` |
| MEMORY_POOL_METRICS | false | `process.runtime.jvm.memory.committed`, `process.runtime.jvm.memory.init`, `process.runtime.jvm.memory.limit`, `process.runtime.jvm.memory.usage`, `process.runtime.jvm.memory.usage_after_last_gc` |
| NETWORK_IO_METRICS | true | `process.runtime.jvm.network.io`, `process.runtime.jvm.network.time` |
| THREAD_METRICS | false | `process.runtime.jvm.threads.count` |
55 changes: 55 additions & 0 deletions instrumentation/runtime-telemetry-jfr/library/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
plugins {
id("otel.library-instrumentation")
}

otelJava {
minJavaVersionSupported.set(JavaVersion.VERSION_17)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how this works with the agent. Do we publish different versions of the agent, or do we package the jfr stuff up in the java 8 agent and only allow it to be activated if we detect we're running on java 17+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear on this either. I tried building with 17, and running with 11 and things run normally, just without the JFR streaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the March 2 2023 SIG . Confirmed nothing must be done in order to prevent JDK17 features from breaking things.

}

dependencies {
testImplementation("io.github.netmikey.logunit:logunit-jul:1.1.3")
}
tasks.create("generateDocs", JavaExec::class) {
group = "build"
description = "Generate table for README.md"
classpath = sourceSets.test.get().runtimeClasspath
mainClass.set("io.opentelemetry.instrumentation.runtimetelemetryjfr.GenerateDocs")
systemProperties.set("jfr.readme.path", project.projectDir.toString() + "/README.md")
}
tasks {

val testG1 by registering(Test::class) {
filter {
includeTestsMatching("*G1GcMemoryMetricTest*")
}
include("**/*G1GcMemoryMetricTest.*")
jvmArgs("-XX:+UseG1GC")
}

val testPS by registering(Test::class) {
filter {
includeTestsMatching("*PsGcMemoryMetricTest*")
}
include("**/*PsGcMemoryMetricTest.*")
jvmArgs("-XX:+UseParallelGC")
}

val testSerial by registering(Test::class) {
filter {
includeTestsMatching("*SerialGcMemoryMetricTest*")
}
include("**/*SerialGcMemoryMetricTest.*")
jvmArgs("-XX:+UseSerialGC")
}

test {
filter {
excludeTestsMatching("*G1GcMemoryMetricTest")
excludeTestsMatching("*SerialGcMemoryMetricTest")
excludeTestsMatching("*PsGcMemoryMetricTest")
}
dependsOn(testG1)
dependsOn(testPS)
dependsOn(testSerial)
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.runtimetelemetryjfr;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.MeterBuilder;
import io.opentelemetry.instrumentation.api.internal.EmbeddedInstrumentationProperties;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.RecordedEventHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.ThreadGrouper;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.buffer.DirectBufferStatisticsHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.classes.ClassesLoadedHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.container.ContainerConfigurationHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.cpu.ContextSwitchRateHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.cpu.LongLockHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.cpu.OverallCpuLoadHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.garbagecollection.G1GarbageCollectionHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.garbagecollection.OldGarbageCollectionHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.garbagecollection.YoungGarbageCollectionHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.CodeCacheConfigurationHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.G1HeapSummaryHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.MetaspaceSummaryHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.ObjectAllocationInNewTlabHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.ObjectAllocationOutsideTlabHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.memory.ParallelHeapSummaryHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.network.NetworkReadHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.network.NetworkWriteHandler;
import io.opentelemetry.instrumentation.runtimetelemetryjfr.internal.threads.ThreadCountHandler;
import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.function.Predicate;
import javax.annotation.Nullable;

final class HandlerRegistry {
private static final String SCOPE_NAME = "io.opentelemetry.instrumentation.runtimetelemetryjfr";

@Nullable
private static final String SCOPE_VERSION =
EmbeddedInstrumentationProperties.findVersion(SCOPE_NAME);

private HandlerRegistry() {}

static List<RecordedEventHandler> getHandlers(
OpenTelemetry openTelemetry, Predicate<JfrFeature> featurePredicate) {

MeterBuilder meterBuilder = openTelemetry.meterBuilder(SCOPE_NAME);
if (SCOPE_VERSION != null) {
meterBuilder.setInstrumentationVersion(SCOPE_VERSION);
}
Meter meter = meterBuilder.build();

List<RecordedEventHandler> handlers = new ArrayList<RecordedEventHandler>();
for (GarbageCollectorMXBean bean : ManagementFactory.getGarbageCollectorMXBeans()) {
String name = bean.getName();
switch (name) {
case "G1 Young Generation":
handlers.add(new G1HeapSummaryHandler(meter));
handlers.add(new G1GarbageCollectionHandler(meter));
break;

case "Copy":
handlers.add(new YoungGarbageCollectionHandler(meter, name));
break;

case "PS Scavenge":
handlers.add(new YoungGarbageCollectionHandler(meter, name));
handlers.add(new ParallelHeapSummaryHandler(meter));
break;

case "G1 Old Generation":
case "PS MarkSweep":
case "MarkSweepCompact":
handlers.add(new OldGarbageCollectionHandler(meter, name));
break;

default:
// If none of the above GCs are detected, no action.
}
}

ThreadGrouper grouper = new ThreadGrouper();
List<RecordedEventHandler> basicHandlers =
List.of(
new ObjectAllocationInNewTlabHandler(meter, grouper),
new ObjectAllocationOutsideTlabHandler(meter, grouper),
new NetworkReadHandler(meter, grouper),
new NetworkWriteHandler(meter, grouper),
new ContextSwitchRateHandler(meter),
new OverallCpuLoadHandler(meter),
new ContainerConfigurationHandler(meter),
new LongLockHandler(meter, grouper),
new ThreadCountHandler(meter),
new ClassesLoadedHandler(meter),
new MetaspaceSummaryHandler(meter),
new CodeCacheConfigurationHandler(meter),
new DirectBufferStatisticsHandler(meter));
handlers.addAll(basicHandlers);

// Filter and close disabled handlers
Iterator<RecordedEventHandler> iter = handlers.iterator();
while (iter.hasNext()) {
RecordedEventHandler handler = iter.next();
if (!featurePredicate.test(handler.getFeature())) {
handler.close();
iter.remove();
}
}

return handlers;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.runtimetelemetryjfr;

/**
* Enumeration of JFR features, which can be toggled on or off via {@link JfrTelemetryBuilder}.
*
* <p>Features are disabled by default if they are already available through {@code
* io.opentelemetry.instrumentation:opentelemetry-runtime-metrics} JMX based instrumentation.
*/
public enum JfrFeature {
BUFFER_METRICS(/* defaultEnabled= */ false),
CLASS_LOAD_METRICS(/* defaultEnabled= */ false),
CONTEXT_SWITCH_METRICS(/* defaultEnabled= */ true),
CPU_COUNT_METRICS(/* defaultEnabled= */ true),
CPU_UTILIZATION_METRICS(/* defaultEnabled= */ false),
GC_DURATION_METRICS(/* defaultEnabled= */ false),
LOCK_METRICS(/* defaultEnabled= */ true),
MEMORY_ALLOCATION_METRICS(/* defaultEnabled= */ true),
MEMORY_POOL_METRICS(/* defaultEnabled= */ false),
NETWORK_IO_METRICS(/* defaultEnabled= */ true),
THREAD_METRICS(/* defaultEnabled= */ false),
;

private final boolean defaultEnabled;

JfrFeature(boolean defaultEnabled) {
this.defaultEnabled = defaultEnabled;
}

boolean isDefaultEnabled() {
return defaultEnabled;
}
}
Loading