Skip to content

Commit

Permalink
[docs] Reorganize configuration docs (#1316)
Browse files Browse the repository at this point in the history
  • Loading branch information
zachgk authored Nov 14, 2023
1 parent 4856b09 commit a5dc1ed
Show file tree
Hide file tree
Showing 6 changed files with 492 additions and 464 deletions.
177 changes: 31 additions & 146 deletions serving/docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,162 +1,47 @@
# DJLServing startup configuration
# DJL Serving Configuration

## Environment variables
DJL Serving is a multi-layer system and has many different forms of configuration across those layers.

User can set environment variables to change DJL Serving behavior, following is a list of
variables that user can set for DJL Serving:
## Global

* JAVA_HOME
* JAVA_OPTS
* SERVING_OPTS
* MODEL_SERVER_HOME
At the beginning, there are [global configurations](configurations_global.md).
These configurations are passed through startup arguments, the config file, and environment variables.

**Note:** environment variable has higher priority that command line or config.properties.
It will override other property values.
As part of the startup, you are able to specify several different categories of options:

**Note:** For tunable parameters for Large Language Models please refer to [this](configurations_large_model_inference_containers.md) guide.
- Global Java settings with environment variables like `$JAVA_HOME` and `$JAVA_OPTS`.
- Loading behavior with the `model_store` and what models to load on startup
- Network settings such as the port and SSL

## Command line parameters
## Engine

User can use the following parameters to start djl-serving, those parameters will override default behavior:
DJL Serving is powered by [DeepJavaLibrary](djl.ai) and most of the functionality exists through the use of [DJL engines](http://docs.djl.ai/docs/engine.html).
As part of this, many of the engines along with DJL itself can be configured through the use of environment variables and system properties.

```
djl-serving -h
The [engine configuration](configurations.md) document lists these configurations.
These include both the ones global to DJL as well as lists for each engine.
There are configurations for paths, versions, performance, settings, and debugging.
All engine configurations are shared between all models and workers using that engine.

usage: djl-serving [OPTIONS]
-f,--config-file <CONFIG-FILE> Path to the configuration properties file.
-h,--help Print this help.
-m,--models <MODELS> Models to be loaded at startup.
-s,--model-store <MODELS-STORE> Model store location where models can be loaded.
```
## Workflow

Details about the models, model-store, and workflows can be found in the equivalent configuration properties.
Next, you are able to add and configure a [Workflow](workflows.md).
DJL Serving has a custom solution for handling workflows that is configured through a `workflow.json` or `workflow.yml` file.

## config.properties file
## Model

DJL Serving use a `config.properties` file to store configurations.
Next, it is possible to specify [model configuration](configurations_model.md).
This is mostly done by using a `serving.properties` file, although there are environment variables that can be used as well.

### Configure listening port
These configurations are also optional.
If no `serving.properties` is provided, some basic properties such as which engine to use will be inferred.
The rest will back back to the global defaults.

DJL Serving only allows localhost access by default.
## Application

* inference_address: inference API binding address, default: http://127.0.0.1:8080
* management_address: management API binding address, default: http://127.0.0.1:8081
Alongside the configurations that determine how DJL Serving runs the model, there are also options that can be passed into the model itself.
The primary way is through the [DJL Model](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/Model.html) properties or [DJL Criteria](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/repository/zoo/Criteria.html) arguments.
These settings are ultimately dependent on the individual model.
But, here are some documented applications that have additional configurations:

Here are a couple of examples:

```properties
# bind inference API to all network interfaces with SSL enabled
inference_address=https://0.0.0.0:8443

# bind inference API to private network interfaces
inference_address=https://172.16.1.10:8443
```

### Configure initial models and workflows

**Model Store**

The `model_store` config property can be used to define a directory where each file/folder in it is a model to be loaded.
It will then attempt to load all of them by default.
Here is an example:

```properties
model_store=build/models
```

**Load Models**

The `load_models` config property can be used to define a list of models (or workflows) to be loaded.
The list should be defined as a comma separated list of urls to load models from.

Each model can be defined either as a URL directly or optionally with prepended endpoint data like `[EndpointData]=modelUrl`.
The endpoint is a list of data items separated by commas.
The possible variations are:

- `[modelName]`
- `[modelName:version]`
- `[modelName:version:engine]`
- `[modelName:version:engine:deviceNames]`

The version can be an arbitrary string.
The engines uses the standard DJL `Engine` names.

Possible deviceNames strings include `*` for all devices and a `;` separated list of device names following the format defined in DJL `Device.fromName`.
If no device is specified, it will use the DJL default device (usually GPU if available else CPU).

```properties
load_models=https://resources.djl.ai/test-models/mlp.tar.gz,[mlp:v1:MXNet:*]=https://resources.djl.ai/test-models/mlp.tar.gz
```

**Workflows**

Use the `load_models` config property to define initial workflows that should be loaded on startup.

```properties
load_models=https://resources.djl.ai/test-models/basic-serving-workflow.json
```

View the [workflow documentation](workflows.md) to see more information about workflows and their configuration format.

### Enable SSL

For users who want to enable HTTPs, you can change `inference_address` or `management_addrss`
protocol from http to https, for example: `inference_addrss=https://127.0.0.1`.
This will make DJL Serving listen on localhost 443 port to accepting https request.

User also must provide certificate and private keys to enable SSL. DJL Serving support two ways to configure SSL:

1. Use keystore
* keystore: Keystore file location, if multiple private key entry in the keystore, first one will be picked.
* keystore_pass: keystore password, key password (if applicable) MUST be the same as keystore password.
* keystore_type: type of keystore, default: PKCS12

2. Use private-key/certificate files
* private_key_file: private key file location, support both PKCS8 and OpenSSL private key.
* certificate_file: X509 certificate chain file location.

#### Self-signed certificate example

This is a quick example to enable SSL with self-signed certificate

##### User java keytool to create keystore

```bash
keytool -genkey -keyalg RSA -alias djl -keystore keystore.p12 -storepass changeit -storetype PKCS12 -validity 3600 -keysize 2048 -dname "CN=www.MY_DOMSON.com, OU=Cloud Service, O=model server, L=Palo Alto, ST=California, C=US"
```

Config following property in config.properties:

```properties
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
keystore=keystore.p12
keystore_pass=changeit
keystore_type=PKCS12
```

##### User OpenSSL to create private key and certificate

```bash
# generate a private key with the correct length
openssl genrsa -out private-key.pem 2048

# generate corresponding public key
openssl rsa -in private-key.pem -pubout -out public-key.pem

# create a self-signed certificate
openssl req -new -x509 -key private-key.pem -out cert.pem -days 360

# convert pem to pfx/p12 keystore
openssl pkcs12 -export -inkey private-key.pem -in cert.pem -out keystore.p12
```

Config following property in config.properties:

```properties
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
keystore=keystore.p12
keystore_pass=changeit
keystore_type=PKCS12
```
- [Large Language Model Configurations](configurations_large_model_inference_containers.md)
183 changes: 2 additions & 181 deletions serving/docs/configurations.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# All DJL configuration options
# Engine Configuration

DJL serving is highly configurable. This document tries to capture those configurations in a single document.

**Note:** For tunable parameters for Large Language Models please refer to [this](configurations_large_model_inference_containers.md) guide.
This covers the available configurations for DJL and engines.

## DJL settings

Expand Down Expand Up @@ -83,134 +81,6 @@ DJLServing build on top of Deep Java Library (DJL). Here is a list of settings f
| ai.djl.python.disable_alternative | system prop | Disable alternative engine |
| TENSOR_PARALLEL_DEGREE | env var | Set tensor parallel degree.<br>For mpi mode, the default is number of accelerators.<br>Use "max" for non-mpi mode to use all GPUs for tensor parallel. |

DJLServing provides a few alias for Python engine to make it easy for common LLM configurations.

- `engine=DeepSpeed`, equivalent to:

```
engine=Python
option.mpi_mode=true
option.entryPoint=djl_python.deepspeed
```

- `engine=FasterTransformer`, this is equivalent to:

```
engine=Python
option.mpi_mode=true
option.entryPoint=djl_python.fastertransformer
```

- `engine=MPI`, this is equivalent to:

```
engine=Python
option.mpi_mode=true
option.entryPoint=djl_python.huggingface
```

## Global Model Server settings

Global settings are configured at model server level. Change to these settings usually requires
restart model server to take effect.

Most of the model server specific configuration can be configured in `conf/config.properties` file.
You can find the configuration keys here:
[ConfigManager.java](https:/deepjavalibrary/djl-serving/blob/master/serving/src/main/java/ai/djl/serving/util/ConfigManager.java#L52-L79)

Each configuration key can also be override by environment variable with `SERVING_` prefix, for example:

```
export SERVING_JOB_QUEUE_SIZE=1000 # This will override JOB_QUEUE_SIZE in the config
```

| Key | Type | Description |
|-------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MODEL_SERVER_HOME | env var | DJLServing home directory, default: Installation directory (e.g. /usr/local/Cellar/djl-serving/0.19.0/) |
| DEFAULT_JVM_OPTS | env var | default: `-Dlog4j.configurationFile=${APP_HOME}/conf/log4j2.xml`<br>Override default JVM startup options and system properties. |
| JAVA_OPTS | env var | default: `-Xms1g -Xmx1g -XX:+ExitOnOutOfMemoryError`<br>Add extra JVM options. |
| SERVING_OPTS | env var | default: N/A<br>Add serving related JVM options.<br>Some of DJL configuration can only be configured by JVM system properties, user has to set DEFAULT_JVM_OPTS environment variable to configure them.<br>- `-Dai.djl.pytorch.num_interop_threads=2`, this will override interop threads for PyTorch<br>- `-Dai.djl.pytorch.num_threads=2`, this will override OMP_NUM_THREADS for PyTorch<br>- `-Dai.djl.logging.level=debug` change DJL loggging level |

## Model specific settings

You set per model settings by adding a [serving.properties](modes.md#servingproperties) file in the root of your model directory (or .zip).
Some of the options can be override by environment variable with `OPTION_` prefix, for example:

```
# to enable rolling batch with only environment variable:
export OPTION_ROLLING_BATCH=auto
```

You can set number of workers for each model:
https:/deepjavalibrary/djl-serving/blob/master/serving/src/test/resources/identity/serving.properties#L4-L8

For example, set minimum workers and maximum workers for your model:

```
minWorkers=32
maxWorkers=64
```

Or you can configure minimum workers and maximum workers differently for GPU and CPU:

```
gpu.minWorkers=2
gpu.maxWorkers=3
cpu.minWorkers=2
cpu.maxWorkers=4
```

job queue size, batch size, max batch delay, max worker idle time can be configured at
per model level, this will override global settings:

```
job_queue_size=10
batch_size=2
max_batch_delay=1
max_idle_time=120
```

You can configure which device to load the model on, default is *:

```
load_on_devices=gpu4;gpu5
# or simply:
load_on_devices=4;5
```

### Python (DeepSpeed)

For Python (DeepSpeed) engine, DJL load multiple workers sequentially by default to avoid run
out of memory. You can reduced model loading time by parallel loading workers if you know the
peak memory won’t cause out of memory:

```
# Allows to load DeepSpeed workers in parallel
option.parallel_loading=true
# specify tensor parallel degree (number of partitions)
option.tensor_parallel_degree=2
# specify per model timeout
option.model_loading_timeout=600
option.predict_timeout=240
# mark the model as failure after python process crashing 10 times
retry_threshold=0
# enable virtual environment
option.enable_venv=true
# use built-in DeepSpeed handler
option.entryPoint=djl_python.deepspeed
# passing extra options to model.py or built-in handler
option.model_id=gpt2
option.data_type=fp32
option.max_new_tokens=50
# defines custom environment variables
env=LARGE_TENSOR=1
# specify the path to the python executable
option.pythonExecutable=/usr/bin/python3
```

## Engine specific settings

DJL support 12 deep learning frameworks, each framework has their own settings. Please refer to
Expand All @@ -229,52 +99,3 @@ The follow table show some engine specific environment variables that is overrid
| TF_CPP_MIN_LOG_LEVEL | TensorFlow | default 1 |
| MXNET_ENGINE_TYPE | MXNet | this value must be `NaiveEngine` |

## Appendix

### How to configure logging

#### Option 1: enable debug log:

```
export SERVING_OPTS="-Dai.djl.logging.level=debug"
```

#### Option 2: use your log4j2.xml

```
export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/MY_CONF/log4j2.xml
```

DJLServing provides a few built-in `log4j2-XXX.xml` files in DJLServing containers.
Use the following environment variable to print HTTP access log to console:

```
export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-access.xml
```

Use the following environment variable to print both access log, server metrics and model metrics to console:

```
export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-console.xml
```

### How to download uncompressed model from S3
To enable fast model downloading, you can store your model artifacts (weights) in a S3 bucket, and
only keep the model code and metadata in the `model.tar.gz` (.zip) file. DJL can leverage
[s5cmd](https:/peak/s5cmd) to download uncompressed files from S3 with extremely fast
speed.

To enable `s5cmd` downloading, you can configure `serving.properties` as the following:

```
option.model_id=s3://YOUR_BUCKET/...
```

### How to resolve python package conflict between models
If you want to deploy multiple python models, but their dependencies has conflict, you can enable
[python virtual environments](https://docs.python.org/3/tutorial/venv.html) for your model:

```
option.enable_venv=true
```

Loading

0 comments on commit a5dc1ed

Please sign in to comment.