Skip to content

Commit

Permalink
Update documentation to reflect CPU-only execution mode (#1924)
Browse files Browse the repository at this point in the history
* Documents writing a stage that supports CPU execution mode
* Updates `docs/source/developer_guide/contributing.md` cleaning up build and troubleshooting sections. 

Requires PRs #1851 & #1906 to be merged first

Closes [#1737](#1737)

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https:/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https:/dagardner-nv)
  - Yuchen Zhang (https:/yczhang-nv)

Approvers:
  - Michael Demoret (https:/mdemoret-nv)

URL: #1924
  • Loading branch information
dagardner-nv authored Oct 18, 2024
1 parent 85d5ad4 commit 47841d6
Show file tree
Hide file tree
Showing 31 changed files with 426 additions and 169 deletions.
1 change: 1 addition & 0 deletions conda/environments/examples_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ dependencies:
- pip
- pluggy=1.3
- pydantic
- pynvml=11.4
- pypdf=3.17.4
- pypdfium2=4.30
- python-confluent-kafka>=1.9.2,<1.10.0a0
Expand Down
7 changes: 7 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ files:
arch: [x86_64]
includes:
- cve-mitigation
- example-abp-nvsmi
- example-dfp-prod
- example-gnn
- example-llms
Expand Down Expand Up @@ -442,6 +443,12 @@ dependencies:
- dgl==2.0.0
- dglgo

example-abp-nvsmi:
common:
- output_types: [conda]
packages:
- pynvml=11.4

example-llms:
common:
- output_types: [conda]
Expand Down
10 changes: 1 addition & 9 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,10 @@

# Building Documentation

Additional packages required for building the documentation are defined in `./conda_docs.yml`.

## Install Additional Dependencies
From the root of the Morpheus repo:
```bash
conda env update --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml --prune
```

## Build Morpheus and Documentation
```
CMAKE_CONFIGURE_EXTRA_ARGS="-DMORPHEUS_BUILD_DOCS=ON" ./scripts/compile.sh --target morpheus_docs
```
Outputs to `build/docs/html`

If the documentation build is unsuccessful, refer to the **Out of Date Build Cache** section in [Troubleshooting](./source/extra_info/troubleshooting.md) to troubleshoot.
10 changes: 5 additions & 5 deletions docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ morpheus --log_level=DEBUG run pipeline-other \

Then the following error displays:
```
RuntimeError: The to-file stage cannot handle input of <class 'morpheus._lib.messages.ControlMessage'>. Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)
RuntimeError: The to-file stage cannot handle input of <class 'morpheus.messages.control_message.ControlMessage'>. Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)
```

This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.messages.ControlMessage`. This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `ControlMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does.
Expand Down Expand Up @@ -207,7 +207,7 @@ This example shows an NLP Pipeline which uses several stages available in Morphe
#### Launching Triton
Run the following to launch Triton and load the `sid-minibert` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/morpheus/morpheus-tritonserver-models:24.10 --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/morpheus/morpheus-tritonserver-models:24.10 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

#### Launching Kafka
Expand All @@ -216,15 +216,15 @@ Follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing
![../img/nlp_kitchen_sink.png](../img/nlp_kitchen_sink.png)

```bash
morpheus --log_level=INFO run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
morpheus --log_level=INFO run --pipeline_batch_size=1024 --model_max_batch_size=32 \
pipeline-nlp --viz_file=.tmp/nlp_kitchen_sink.png \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
preprocess \
inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8001 \
inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8000 \
monitor --description "Inference Rate" --smoothing=0.001 --unit "inf" \
add-class \
filter --threshold=0.8 \
filter --filter_source=TENSOR --threshold=0.8 \
serialize --include 'timestamp' --exclude '^_ts_' \
to-kafka --bootstrap_servers localhost:9092 --output_topic "inference_output" \
monitor --description "ToKafka Rate" --smoothing=0.001 --unit "msg"
Expand Down
15 changes: 11 additions & 4 deletions docs/source/basics/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,22 @@ run:
$ morpheus run --help
Usage: morpheus run [OPTIONS] COMMAND [ARGS]...
Run subcommand, used for running a pipeline
Options:
--num_threads INTEGER RANGE Number of internal pipeline threads to use [default: 12; x>=1]
--num_threads INTEGER RANGE Number of internal pipeline threads to use [default: 64; x>=1]
--pipeline_batch_size INTEGER RANGE
Internal batch size for the pipeline. Can be much larger than the model batch size. Also used for Kafka consumers [default: 256; x>=1]
--model_max_batch_size INTEGER RANGE
Max batch size to use for the model [default: 8; x>=1]
--edge_buffer_size INTEGER RANGE
The size of buffered channels to use between nodes in a pipeline. Larger values reduce backpressure at the cost of memory. Smaller values will push
messages through the pipeline quicker. Must be greater than 1 and a power of 2 (i.e. 2, 4, 8, 16, etc.) [default: 128; x>=2]
--use_cpp BOOLEAN Whether or not to use C++ node and message types or to prefer python. Only use as a last resort if bugs are encountered [default: True]
The size of buffered channels to use between nodes in a pipeline. Larger values reduce backpressure at the cost of memory. Smaller
values will push messages through the pipeline quicker. Must be greater than 1 and a power of 2 (i.e. 2, 4, 8, 16, etc.) [default:
128; x>=2]
--use_cpp BOOLEAN [Deprecated] Whether or not to use C++ node and message types or to prefer python. Only use as a last resort if bugs are encountered.
Cannot be used with --use_cpu_only [default: True]
--use_cpu_only Whether or not to run in CPU only mode, setting this to True will disable C++ mode. Cannot be used with --use_cpp
--manual_seed INTEGER RANGE Manually seed the random number generators used by Morpheus, useful for testing. [x>=1]
--help Show this message and exit.
Commands:
Expand All @@ -57,6 +63,7 @@ run:
pipeline-nlp Run the inference pipeline with a NLP model
pipeline-other Run a custom inference pipeline without a specific model type
Currently, Morpheus pipeline can be operated in four different modes.

* ``pipeline-ae``
Expand Down
14 changes: 0 additions & 14 deletions docs/source/cloud_deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,11 +434,9 @@ Inference and training based on a user ID (`user123`). The model is trained once
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=2 \
--edge_buffer_size=4 \
--pipeline_batch_size=1024 \
--model_max_batch_size=1024 \
--use_cpp=False \
pipeline-ae \
--columns_file=data/columns_ae_cloudtrail.txt \
--userid_filter=user123 \
Expand Down Expand Up @@ -480,11 +478,9 @@ Pipeline example to read data from a file, run inference using a `phishing-bert-
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=2 \
--edge_buffer_size=4 \
--pipeline_batch_size=1024 \
--model_max_batch_size=32 \
--use_cpp=True \
pipeline-nlp \
--model_seq_length=128 \
--labels_file=data/labels_phishing.txt \
Expand All @@ -510,11 +506,9 @@ Pipeline example to read messages from an input Kafka topic, run inference using
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=2 \
--edge_buffer_size=4 \
--pipeline_batch_size=1024 \
--model_max_batch_size=32 \
--use_cpp=True \
pipeline-nlp \
--model_seq_length=128 \
--labels_file=data/labels_phishing.txt \
Expand Down Expand Up @@ -557,9 +551,7 @@ Pipeline example to read data from a file, run inference using a `sid-minibert-o
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=3 \
--edge_buffer_size=4 \
--use_cpp=True \
--pipeline_batch_size=1024 \
--model_max_batch_size=32 \
pipeline-nlp \
Expand All @@ -586,9 +578,7 @@ Pipeline example to read messages from an input Kafka topic, run inference using
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=3 \
--edge_buffer_size=4 \
--use_cpp=True \
--pipeline_batch_size=1024 \
--model_max_batch_size=32 \
pipeline-nlp \
Expand Down Expand Up @@ -631,11 +621,9 @@ Pipeline example to read data from a file, run inference using an `abp-nvsmi-xgb
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=3 \
--edge_buffer_size=4 \
--pipeline_batch_size=1024 \
--model_max_batch_size=64 \
--use_cpp=True \
pipeline-fil --columns_file=data/columns_fil.txt \
from-file --filename=./examples/data/nvsmi.jsonlines \
monitor --description 'FromFile Rate' --smoothing=0.001 \
Expand All @@ -657,10 +645,8 @@ Pipeline example to read messages from an input Kafka topic, run inference using
```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--num_threads=3 \
--pipeline_batch_size=1024 \
--model_max_batch_size=64 \
--use_cpp=True \
pipeline-fil --columns_file=data/columns_fil.txt \
from-kafka --input_topic <YOUR_INPUT_KAFKA_TOPIC> --bootstrap_servers broker:9092 \
monitor --description 'FromKafka Rate' --smoothing=0.001 \
Expand Down
91 changes: 63 additions & 28 deletions docs/source/developer_guide/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,48 +153,42 @@ This workflow utilizes a Docker container to set up most dependencies ensuring a
If a Conda environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).
Note: These instructions assume the user is using `mamba` instead of `conda` since its improved solver speed is very helpful when working with a large number of dependencies. If you are not familiar with `mamba` you can install it with `conda install -n base -c conda-forge mamba` (Make sure to only install into the base environment). `mamba` is a drop in replacement for `conda` and all Conda commands are compatible between the two.
#### Prerequisites
- Volta architecture GPU or better
- [CUDA 12.1](https://developer.nvidia.com/cuda-12-1-0-download-archive)
- `conda` and `mamba`
- If `conda` and `mamba` are not installed, we recommend using the MiniForge install guide which is located [here](https:/conda-forge/miniforge). This will install both `conda` and `mamba` and set the channel default to use `conda-forge`.
- `conda`
- If `conda` is not installed, we recommend using the [MiniForge install guide](https:/conda-forge/miniforge). This will install `conda` and set the channel default to use `conda-forge`.
1. Set up environment variables and clone the repo:
```bash
export MORPHEUS_ROOT=$(pwd)/morpheus
git clone https:/nv-morpheus/Morpheus.git $MORPHEUS_ROOT
cd $MORPHEUS_ROOT
```
2. Ensure all submodules are checked out:
```bash
git submodule update --init --recursive
```
1. Ensure all submodules are checked out:
```bash
git submodule update --init --recursive
```
1. Create the Morpheus Conda environment
```bash
conda env create --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml
conda activate morpheus
```
This creates a new environment named `morpheus`, and activates that environment.
1. Build Morpheus
> **Note**: The `dev_cuda-121_arch-x86_64.yaml` Conda environment file specifies all of the dependencies required to build Morpheus and run Morpheus. However many of the examples, and optional packages such as `morpheus_llm` require additional dependencies. Alternately the following command can be used to create the Conda environment:
```bash
./scripts/compile.sh
conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-121_arch-x86_64.yaml
conda activate morpheus
```
This script will run both CMake Configure with default options and CMake build.
1. Install Morpheus
1. Build Morpheus
```bash
pip install -e ${MORPHEUS_ROOT}/python/morpheus
pip install -e ${MORPHEUS_ROOT}/python/morpheus_llm
pip install -e ${MORPHEUS_ROOT}/python/morpheus_dfp
./scripts/compile.sh
```
Once Morpheus has been built, it can be installed into the current virtual environment.
1. Test the build (Note: some tests will be skipped)\
This script will build and install Morpheus into the Conda environment.
1. Test the build (Note: some tests will be skipped)
Some of the tests will rely on external data sets.
```bash
MORPHEUS_ROOT=${PWD}
Expand All @@ -213,15 +207,26 @@ git submodule update --init --recursive
npm install -g [email protected]
```
Run all tests:
```bash
pytest --run_slow
```
1. Optional: Install cuML
- Many users may wish to install cuML. Due to the complex dependency structure and versioning requirements, we need to specify exact versions of each package. The command to accomplish this is:
- Run end-to-end (aka slow) tests:
```bash
pytest --run_slow
```
1. Optional: Run Kafka and Milvus tests
- Download Kafka:
```bash
mamba install -c rapidsai -c nvidia -c conda-forge cuml=23.06
python ./ci/scripts/download_kafka.py
```
- Run all tests (this will skip over tests that require optional dependencies which are not installed):
```bash
pytest --run_slow --run_kafka --run_milvus
```
- Run all tests including those that require optional dependencies:
```bash
pytest --fail_missing --run_slow --run_kafka --run_milvus
```
1. Run Morpheus
```bash
morpheus run pipeline-nlp ...
Expand Down Expand Up @@ -372,6 +377,36 @@ Due to the large number of dependencies, it's common to run into build issues. T
- Message indicating `git apply ...` failed
- Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If this error displays, try deleting the offending package from the `build/_deps/<offending_package>` directory or from `.cache/cpm/<offending_package>`.
- If all else fails, delete the entire `build/` directory and `.cache/` directory.
- Older build artifacts when performing an in-place build.
- When built with `MORPHEUS_PYTHON_INPLACE_BUILD=ON` compiled libraries will be deployed in-place in the source tree, and older build artifacts exist in the source tree. Remove these with:
```bash
find ./python -name "*.so" -delete
find ./examples -name "*.so" -delete
```
- Issues building documentation
- Intermediate documentation build artifacts can cause errors for Sphinx. To remove these, run:
```bash
rm -rf build/docs/ docs/source/_modules docs/source/_lib
```
- CI Issues
- To run CI locally, the `ci/scripts/run_ci_local.sh` script can be used. For example to run a local CI build:
```bash
ci/scripts/run_ci_local.sh build
```
- Build artifacts resulting from a local CI run can be found in the `.tmp/local_ci_tmp/` directory.
- To troubleshoot a particular CI stage it can be helpful to run:
```bash
ci/scripts/run_ci_local.sh bash
```

This will open a bash shell inside the CI container with all of the environment variables typically set during a CI run. From here you can run the commands that would typically be run by one of the CI scripts in `ci/scripts/github`.

To run a CI stage requiring a GPU (ex: `test`), set the `USE_GPU` environment variable to `1`:
```bash
USE_GPU=1 ci/scripts/run_ci_local.sh bash
```

Refer to the [troubleshooting guide](../extra_info/troubleshooting.md) for more information on common issues and how to resolve them.

## Licensing
Morpheus is licensed under the Apache v2.0 license. All new source files including CMake and other build scripts should contain the Apache v2.0 license header. Any edits to existing source code should update the date range of the copyright to the current year. The format for the license header is:
Expand Down Expand Up @@ -401,7 +436,7 @@ Third-party code included in the source tree (that is not pulled in as an extern
Ex:
```
/**
* SPDX-FileCopyrightText: Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) <year>, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -539,7 +539,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source duo \
--start_time "2022-08-01" \
--duration "60d" \
Expand All @@ -551,7 +550,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source duo \
--start_time "2022-08-30" \
--input_file "./control_messages/duo_payload_inference.json"
Expand All @@ -561,7 +559,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source duo \
--start_time "2022-08-01" \
--duration "60d" \
Expand All @@ -573,7 +570,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source azure \
--start_time "2022-08-01" \
--duration "60d" \
Expand All @@ -585,7 +581,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source azure \
--start_time "2022-08-30" \
--input_file "./control_messages/azure_payload_inference.json"
Expand All @@ -595,7 +590,6 @@ To run the DFP pipelines with the example datasets within the container, run the
```bash
python dfp_integrated_training_batch_pipeline.py \
--log_level DEBUG \
--use_cpp=true \
--source azure \
--start_time "2022-08-01" \
--duration "60d" \
Expand Down
Loading

0 comments on commit 47841d6

Please sign in to comment.