[docs] Reorganize configuration docs (#1316)

deepjavalibrary · Nov 14, 2023 · a5dc1ed · a5dc1ed
1 parent 4856b09
commit a5dc1ed
Show file tree

Hide file tree

Showing 6 changed files with 492 additions and 464 deletions.
diff --git a/serving/docs/configuration.md b/serving/docs/configuration.md
@@ -1,162 +1,47 @@
-# DJLServing startup configuration
+# DJL Serving Configuration
 
-## Environment variables
+DJL Serving is a multi-layer system and has many different forms of configuration across those layers.
 
-User can set environment variables to change DJL Serving behavior, following is a list of
-variables that user can set for DJL Serving:
+## Global
 
-* JAVA_HOME
-* JAVA_OPTS
-* SERVING_OPTS
-* MODEL_SERVER_HOME
+At the beginning, there are [global configurations](configurations_global.md).
+These configurations are passed through startup arguments, the config file, and environment variables.
 
-**Note:** environment variable has higher priority that command line or config.properties.
-It will override other property values.
+As part of the startup, you are able to specify several different categories of options:
 
-**Note:** For tunable parameters for Large Language Models please refer to [this](configurations_large_model_inference_containers.md) guide.
+- Global Java settings with environment variables like `$JAVA_HOME` and `$JAVA_OPTS`.
+- Loading behavior with the `model_store` and what models to load on startup
+- Network settings such as the port and SSL
 
-## Command line parameters
+## Engine
 
-User can use the following parameters to start djl-serving, those parameters will override default behavior:
+DJL Serving is powered by [DeepJavaLibrary](djl.ai) and most of the functionality exists through the use of [DJL engines](http://docs.djl.ai/docs/engine.html).
+As part of this, many of the engines along with DJL itself can be configured through the use of environment variables and system properties.
 
-```
-djl-serving -h
+The [engine configuration](configurations.md) document lists these configurations.
+These include both the ones global to DJL as well as lists for each engine.
+There are configurations for paths, versions, performance, settings, and debugging.
+All engine configurations are shared between all models and workers using that engine.
 
-usage: djl-serving [OPTIONS]
- -f,--config-file <CONFIG-FILE> Path to the configuration properties file.
- -h,--help Print this help.
- -m,--models <MODELS> Models to be loaded at startup.
- -s,--model-store <MODELS-STORE> Model store location where models can be loaded.
-```
+## Workflow
 
-Details about the models, model-store, and workflows can be found in the equivalent configuration properties.
+Next, you are able to add and configure a [Workflow](workflows.md).
+DJL Serving has a custom solution for handling workflows that is configured through a `workflow.json` or `workflow.yml` file.
 
-## config.properties file
+## Model
 
-DJL Serving use a `config.properties` file to store configurations.
+Next, it is possible to specify [model configuration](configurations_model.md).
+This is mostly done by using a `serving.properties` file, although there are environment variables that can be used as well.
 
-### Configure listening port
+These configurations are also optional.
+If no `serving.properties` is provided, some basic properties such as which engine to use will be inferred.
+The rest will back back to the global defaults.
 
-DJL Serving only allows localhost access by default.
+## Application
 
-* inference_address: inference API binding address, default: http://127.0.0.1:8080
-* management_address: management API binding address, default: http://127.0.0.1:8081
+Alongside the configurations that determine how DJL Serving runs the model, there are also options that can be passed into the model itself.
+The primary way is through the [DJL Model](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/Model.html) properties or [DJL Criteria](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/repository/zoo/Criteria.html) arguments.
+These settings are ultimately dependent on the individual model.
+But, here are some documented applications that have additional configurations:
 
-Here are a couple of examples:
-
-```properties
-# bind inference API to all network interfaces with SSL enabled
-inference_address=https://0.0.0.0:8443
-
-# bind inference API to private network interfaces
-inference_address=https://172.16.1.10:8443
-```
-
-### Configure initial models and workflows
-
-**Model Store**
-
-The `model_store` config property can be used to define a directory where each file/folder in it is a model to be loaded.
-It will then attempt to load all of them by default.
-Here is an example:
-
-```properties
-model_store=build/models
-```
-
-**Load Models**
-
-The `load_models` config property can be used to define a list of models (or workflows) to be loaded.
-The list should be defined as a comma separated list of urls to load models from.
-
-Each model can be defined either as a URL directly or optionally with prepended endpoint data like `[EndpointData]=modelUrl`.
-The endpoint is a list of data items separated by commas.
-The possible variations are:
-
-- `[modelName]`
-- `[modelName:version]`
-- `[modelName:version:engine]`
-- `[modelName:version:engine:deviceNames]`
-
-The version can be an arbitrary string.
-The engines uses the standard DJL `Engine` names.
-
-Possible deviceNames strings include `*` for all devices and a `;` separated list of device names following the format defined in DJL `Device.fromName`.
-If no device is specified, it will use the DJL default device (usually GPU if available else CPU).
-
-```properties
-load_models=https://resources.djl.ai/test-models/mlp.tar.gz,[mlp:v1:MXNet:*]=https://resources.djl.ai/test-models/mlp.tar.gz
-```
-
-**Workflows**
-
-Use the `load_models` config property to define initial workflows that should be loaded on startup.
-
-```properties
-load_models=https://resources.djl.ai/test-models/basic-serving-workflow.json
-```
-
-View the [workflow documentation](workflows.md) to see more information about workflows and their configuration format.
-
-### Enable SSL
-
-For users who want to enable HTTPs, you can change `inference_address` or `management_addrss`
-protocol from http to https, for example: `inference_addrss=https://127.0.0.1`.
-This will make DJL Serving listen on localhost 443 port to accepting https request.
-
-User also must provide certificate and private keys to enable SSL. DJL Serving support two ways to configure SSL:
-
-1. Use keystore
- * keystore: Keystore file location, if multiple private key entry in the keystore, first one will be picked.
- * keystore_pass: keystore password, key password (if applicable) MUST be the same as keystore password.
- * keystore_type: type of keystore, default: PKCS12
-
-2. Use private-key/certificate files
- * private_key_file: private key file location, support both PKCS8 and OpenSSL private key.
- * certificate_file: X509 certificate chain file location.
-
-#### Self-signed certificate example
-
-This is a quick example to enable SSL with self-signed certificate
-
-##### User java keytool to create keystore
-
-```bash
-keytool -genkey -keyalg RSA -alias djl -keystore keystore.p12 -storepass changeit -storetype PKCS12 -validity 3600 -keysize 2048 -dname "CN=www.MY_DOMSON.com, OU=Cloud Service, O=model server, L=Palo Alto, ST=California, C=US"
-```
-
- Config following property in config.properties:
-
-```properties
-inference_address=https://127.0.0.1:8443
-management_address=https://127.0.0.1:8444
-keystore=keystore.p12
-keystore_pass=changeit
-keystore_type=PKCS12
-```
-
-##### User OpenSSL to create private key and certificate
-
-```bash
-# generate a private key with the correct length
-openssl genrsa -out private-key.pem 2048
-
-# generate corresponding public key
-openssl rsa -in private-key.pem -pubout -out public-key.pem
-
-# create a self-signed certificate
-openssl req -new -x509 -key private-key.pem -out cert.pem -days 360
-
-# convert pem to pfx/p12 keystore
-openssl pkcs12 -export -inkey private-key.pem -in cert.pem -out keystore.p12
-```
-
- Config following property in config.properties:
-
-```properties
-inference_address=https://127.0.0.1:8443
-management_address=https://127.0.0.1:8444
-keystore=keystore.p12
-keystore_pass=changeit
-keystore_type=PKCS12
-```
+- [Large Language Model Configurations](configurations_large_model_inference_containers.md)
diff --git a/serving/docs/configurations.md b/serving/docs/configurations.md
@@ -1,8 +1,6 @@
-# All DJL configuration options
+# Engine Configuration
 
-DJL serving is highly configurable. This document tries to capture those configurations in a single document.
-
-**Note:** For tunable parameters for Large Language Models please refer to [this](configurations_large_model_inference_containers.md) guide.
+This covers the available configurations for DJL and engines.
 
 ## DJL settings
 
@@ -83,134 +81,6 @@ DJLServing build on top of Deep Java Library (DJL). Here is a list of settings f
 | ai.djl.python.disable_alternative | system prop | Disable alternative engine |
 | TENSOR_PARALLEL_DEGREE | env var | Set tensor parallel degree.<br>For mpi mode, the default is number of accelerators.<br>Use "max" for non-mpi mode to use all GPUs for tensor parallel. |
 
-DJLServing provides a few alias for Python engine to make it easy for common LLM configurations.
-
-- `engine=DeepSpeed`, equivalent to:
-
-```
-engine=Python
-option.mpi_mode=true
-option.entryPoint=djl_python.deepspeed
-```
-
-- `engine=FasterTransformer`, this is equivalent to:
-
-```
-engine=Python
-option.mpi_mode=true
-option.entryPoint=djl_python.fastertransformer
-```
-
-- `engine=MPI`, this is equivalent to:
-
-```
-engine=Python
-option.mpi_mode=true
-option.entryPoint=djl_python.huggingface
-```
-
-## Global Model Server settings
-
-Global settings are configured at model server level. Change to these settings usually requires
-restart model server to take effect.
-
-Most of the model server specific configuration can be configured in `conf/config.properties` file.
-You can find the configuration keys here:
-[ConfigManager.java](https:/deepjavalibrary/djl-serving/blob/master/serving/src/main/java/ai/djl/serving/util/ConfigManager.java#L52-L79)
-
-Each configuration key can also be override by environment variable with `SERVING_` prefix, for example:
-
-```
-export SERVING_JOB_QUEUE_SIZE=1000 # This will override JOB_QUEUE_SIZE in the config
-```
-
-| Key | Type | Description |
-|-------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| MODEL_SERVER_HOME | env var | DJLServing home directory, default: Installation directory (e.g. /usr/local/Cellar/djl-serving/0.19.0/) |
-| DEFAULT_JVM_OPTS | env var | default: `-Dlog4j.configurationFile=${APP_HOME}/conf/log4j2.xml`<br>Override default JVM startup options and system properties. |
-| JAVA_OPTS | env var | default: `-Xms1g -Xmx1g -XX:+ExitOnOutOfMemoryError`<br>Add extra JVM options. |
-| SERVING_OPTS | env var | default: N/A<br>Add serving related JVM options.<br>Some of DJL configuration can only be configured by JVM system properties, user has to set DEFAULT_JVM_OPTS environment variable to configure them.<br>- `-Dai.djl.pytorch.num_interop_threads=2`, this will override interop threads for PyTorch<br>- `-Dai.djl.pytorch.num_threads=2`, this will override OMP_NUM_THREADS for PyTorch<br>- `-Dai.djl.logging.level=debug` change DJL loggging level |
-
-## Model specific settings
-
-You set per model settings by adding a [serving.properties](modes.md#servingproperties) file in the root of your model directory (or .zip).
-Some of the options can be override by environment variable with `OPTION_` prefix, for example:
-
-```
-# to enable rolling batch with only environment variable:
-export OPTION_ROLLING_BATCH=auto
-```
-
-You can set number of workers for each model:
-https:/deepjavalibrary/djl-serving/blob/master/serving/src/test/resources/identity/serving.properties#L4-L8
-
-For example, set minimum workers and maximum workers for your model:
-
-```
-minWorkers=32
-maxWorkers=64
-```
-
-Or you can configure minimum workers and maximum workers differently for GPU and CPU:
-
-```
-gpu.minWorkers=2
-gpu.maxWorkers=3
-cpu.minWorkers=2
-cpu.maxWorkers=4
-```
-
-job queue size, batch size, max batch delay, max worker idle time can be configured at
-per model level, this will override global settings:
-
-```
-job_queue_size=10
-batch_size=2
-max_batch_delay=1
-max_idle_time=120
-```
-
-You can configure which device to load the model on, default is *:
-
-```
-load_on_devices=gpu4;gpu5
-# or simply:
-load_on_devices=4;5
-```
-
-### Python (DeepSpeed)
-
-For Python (DeepSpeed) engine, DJL load multiple workers sequentially by default to avoid run
-out of memory. You can reduced model loading time by parallel loading workers if you know the
-peak memory won’t cause out of memory:
-
-```
-# Allows to load DeepSpeed workers in parallel
-option.parallel_loading=true
-# specify tensor parallel degree (number of partitions)
-option.tensor_parallel_degree=2
-# specify per model timeout
-option.model_loading_timeout=600
-option.predict_timeout=240
-# mark the model as failure after python process crashing 10 times
-retry_threshold=0
-
-# enable virtual environment
-option.enable_venv=true
-
-# use built-in DeepSpeed handler
-option.entryPoint=djl_python.deepspeed
-# passing extra options to model.py or built-in handler
-option.model_id=gpt2
-option.data_type=fp32
-option.max_new_tokens=50
-
-# defines custom environment variables
-env=LARGE_TENSOR=1
-# specify the path to the python executable
-option.pythonExecutable=/usr/bin/python3
-```
-
 ## Engine specific settings
 
 DJL support 12 deep learning frameworks, each framework has their own settings. Please refer to
@@ -229,52 +99,3 @@ The follow table show some engine specific environment variables that is overrid
 | TF_CPP_MIN_LOG_LEVEL | TensorFlow | default 1 |
 | MXNET_ENGINE_TYPE | MXNet | this value must be `NaiveEngine` |
 
-## Appendix
-
-### How to configure logging
-
-#### Option 1: enable debug log:
-
-```
-export SERVING_OPTS="-Dai.djl.logging.level=debug"
-```
-
-#### Option 2: use your log4j2.xml
-
-```
-export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/MY_CONF/log4j2.xml
-```
-
-DJLServing provides a few built-in `log4j2-XXX.xml` files in DJLServing containers.
-Use the following environment variable to print HTTP access log to console:
-
-```
-export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-access.xml
-```
-
-Use the following environment variable to print both access log, server metrics and model metrics to console:
-
-```
-export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-console.xml
-```
-
-### How to download uncompressed model from S3
-To enable fast model downloading, you can store your model artifacts (weights) in a S3 bucket, and
-only keep the model code and metadata in the `model.tar.gz` (.zip) file. DJL can leverage
-[s5cmd](https:/peak/s5cmd) to download uncompressed files from S3 with extremely fast
-speed.
-
-To enable `s5cmd` downloading, you can configure `serving.properties` as the following:
-
-```
-option.model_id=s3://YOUR_BUCKET/...
-```
-
-### How to resolve python package conflict between models
-If you want to deploy multiple python models, but their dependencies has conflict, you can enable
-[python virtual environments](https://docs.python.org/3/tutorial/venv.html) for your model:
-
-```
-option.enable_venv=true
-```
-