Skip to content

Commit

Permalink
Generating new version 1.23.0
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Action Website Snapshot committed Oct 4, 2024
1 parent 9dc9ac8 commit 34c4f4f
Show file tree
Hide file tree
Showing 238 changed files with 13,537 additions and 0 deletions.
1 change: 1 addition & 0 deletions versioned_docs/version-1.23.0/before-ol.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions versioned_docs/version-1.23.0/client/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Client Libraries",
"position": 4
}
4 changes: 4 additions & 0 deletions versioned_docs/version-1.23.0/client/java/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Java",
"position": 1
}
118 changes: 118 additions & 0 deletions versioned_docs/version-1.23.0/client/java/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
sidebar_position: 2
title: Configuration
---

We recommend configuring the client with an `openlineage.yml` file that contains all the
details of how to connect to your OpenLineage backend.

See [example configurations.](#transports)

You can make this file available to the client in three ways (the list also presents precedence of the configuration):

1. Set an `OPENLINEAGE_CONFIG` environment variable to a file path: `OPENLINEAGE_CONFIG=path/to/openlineage.yml`.
2. Place an `openlineage.yml` in the user's current working directory.
3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`).

## Environment Variables

The following environment variables are available:

| Name | Description | Since |
|----------------------|-----------------------------------------------------------------------------|-------|
| OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | |
| OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 |

You can also configure the client with dynamic environment variables.

import DynamicEnvVars from './partials/java_dynamic_env_vars.md';

<DynamicEnvVars/>

## Facets Configuration

In YAML configuration file you can also disable facets to filter them out from the OpenLineage event.

*YAML Configuration*

```yaml
transport:
type: console
facets:
spark_unknown:
disabled: true
spark:
logicalPlan:
disabled: true
```
### Deprecated syntax
The following syntax is deprecated and soon will be removed:
```yaml
transport:
type: console
facets:
disabled:
- spark_unknown
- spark.logicalPlan
```
The rationale behind deprecation is that some of the facets were disabled by default in some integrations. When we added
something extra but didn't include the defaults, they were unintentionally enabled.
## Transports
import Transports from './partials/java_transport.md';
<Transports/>
### Error Handling via Transport
```java
// Connect to http://localhost:5000
OpenLineageClient client = OpenLineageClient.builder()
.transport(
HttpTransport.builder()
.uri("http://localhost:5000")
.apiKey("f38d2189-c603-4b46-bdea-e573a3b5a7d5")
.build())
.registerErrorHandler(new EmitErrorHandler() {
@Override
public void handleError(Throwable throwable) {
// Handle emit error here
}
}).build();
```

### Defining Your Own Transport

```java
OpenLineageClient client = OpenLineageClient.builder()
.transport(
new MyTransport() {
@Override
public void emit(OpenLineage.RunEvent runEvent) {
// Add emit logic here
}
}).build();
```

## Circuit Breakers

import CircuitBreakers from './partials/java_circuit_breaker.md';

<CircuitBreakers/>

## Metrics

import Metrics from './partials/java_metrics.md';

<Metrics/>

## Dataset Namespace Resolver

import DatasetNamespaceResolver from './partials/java_namespace_resolver.md';

<DatasetNamespaceResolver/>
39 changes: 39 additions & 0 deletions versioned_docs/version-1.23.0/client/java/java.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
sidebar_position: 5
---

# Java

## Overview

The OpenLineage Java is a SDK for Java programming language that users can use to generate and emit OpenLineage events to OpenLineage backends.
The core data structures currently offered by the client are the `RunEvent`, `RunState`, `Run`, `Job`, `Dataset`,
and `Transport` classes, along with various `Facets` that can come under run, job, and dataset.

There are various [transport classes](#transports) that the library provides that carry the lineage events into various target endpoints (e.g. HTTP).

You can also use the Java client to create your own custom integrations.

## Installation

Java client is provided as library that can either be imported into your Java project using Maven or Gradle.

Maven:

```xml
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>openlineage-java</artifactId>
<version>${OPENLINEAGE_VERSION}</version>
</dependency>
```

or Gradle:

```groovy
implementation("io.openlineage:openlineage-java:${OPENLINEAGE_VERSION}")
```

For more information on the available versions of the `openlineage-java`,
please refer to the [maven repository](https://search.maven.org/artifact/io.openlineage/openlineage-java).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::info
This feature is available in OpenLineage versions >= 1.9.0.
:::

To prevent from over-instrumentation OpenLineage integration provides a circuit breaker mechanism
that stops OpenLineage from creating, serializing and sending OpenLineage events.

### Simple Memory Circuit Breaker

Simple circuit breaker which is working based only on free memory within JVM. Configuration should
contain free memory threshold limit (percentage). Default value is `20%`. The circuit breaker
will close within first call if free memory is low. `circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called. Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: simpleMemory
memoryThreshold: 20
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">
| Parameter | Definition | Example |
--------------------------------------|----------------------------------------------------------------|--------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |
</TabItem>
<TabItem value="flink" label="Flink Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |
</TabItem>
</Tabs>
### Java Runtime Circuit Breaker
More complex version of circuit breaker. The amount of free memory can be low as long as
amount of time spent on Garbage Collection is acceptable. `JavaRuntimeCircuitBreaker` closes
when free memory drops below threshold and amount of time spent on garbage collection exceeds
given threshold (`10%` by default). The circuit breaker is always open when checked for the first time
as GC threshold is computed since the previous circuit breaker call.
`circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called.
Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: javaRuntime
memoryThreshold: 20
gcCpuThreshold: 10
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |


</TabItem>
<TabItem value="flink" label="Flink Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |


</TabItem>
</Tabs>

### Custom Circuit Breaker

List of available circuit breakers can be extended with custom one loaded via ServiceLoader
with own implementation of `io.openlineage.client.circuitBreaker.CircuitBreakerBuilder`.
Loading

0 comments on commit 34c4f4f

Please sign in to comment.