Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a CPU-only Mode #1851

Merged

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Aug 19, 2024

Description

  • Adds a new enum morpheus.config.ExecutionMode with members GPU & CPU along with a new morpheus.config.Config.execution_mode attribute.
  • For backwards compatibility, by default Config.execution_mode will always default to GPU
  • Add new supported_execution_modes to StageBase which returns ExecutionMode.GPU by default. This ensures that building a pipeline with a stage not matching the execution mode will raise a reasonable error to the user.
  • Add CpuOnlyMixin and GpuAndCpuMixin mixins to automate overriding this, and makes it easier for users to determine which execution modes a given stage supports at a glance.
  • Since C++ Stage/Message impls can only support cuDF DataFrames, and RMM tensors, this PR re-purposes the existing Python stage/message impls mode to serve as CPU-only mode.
  • CPU-only mode will center around pandas DataFrames and NumPy arrays for tensors, since the current Python code which expects cuDF/CuPy is already 99% compatible with pandas/NumPy.
  • Avoid importing cudf or any other GPU based package which will fail on import at the top-level of a module. This is important for stage, message and modules which are automatically imported by the morpheus CLI tool.
  • Add new utility methods to morpheus.utils.type_utils (ex: get_df_pkg, is_cudf_type) to help avoid importing cudf directly
  • Add a new Config.freeze method which will make a config object immutable. This will be called the first time a config object is used to construct a pipeline or stage object. Prevents the possibility of config parameters from being changed in the middle of pipeline construction.
  • CudfHelper::load is no longer called automatically on import, instead it is called manually on pipeline build when execution mode is GPU.
  • Add Python implementation of ControlMessage
  • To simulate a system without a GPU to test CPU-only mode, if the CPU_ONLY environment variable is defined docker/run_container_dev.sh will launch the container using the runc runtime.
  • Remove automatic test parameterization of C++/Python mode, since supporting CPU-only mode will become the exception not the rule. Add a new gpu_and_cpu_mode test marker to explicitly indicate a test intended to be parameterized over execution modes.
  • Fix copy constructor for ControlMessage
  • AppShieldSourceStage now emits ControlMessages, AppShieldMessageMeta is now deprecated
  • AutoencoderSourceStage and thus AzureSourceStage, CloudTrailSourceStage, and DuoSourceStage now emit ControlMessage, UserMessageMeta is now deprecated.
  • DFP production pipeline updated to remove DFPMessageMeta, pipeline now executes in C++ mode.
  • Consolidate common logig in docker/run_container_dev.sh & docker/run_container_release.sh into docker/run_container.sh
  • Remove inconsistent behavior in the Python impl of TensorMemory.set_tensor ([BUG]: Python impl of TensorMemory.set_tensor reshapes incoming tensors #1955)

Closes #1646
Closes #1846
Closes #1852
Closes #1955

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

@dagardner-nv dagardner-nv added breaking Breaking change feature request New feature or request DO NOT MERGE PR should not be merged; see PR for details skip-ci Optionally Skip CI for this PR labels Aug 19, 2024
@dagardner-nv dagardner-nv self-assigned this Aug 19, 2024
@dagardner-nv dagardner-nv requested review from a team as code owners August 19, 2024 19:37
Copy link

copy-pr-bot bot commented Aug 19, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dagardner-nv dagardner-nv marked this pull request as draft August 19, 2024 19:37
tests/conftest.py Outdated Show resolved Hide resolved
@dagardner-nv
Copy link
Contributor Author

/ok to test

@dagardner-nv
Copy link
Contributor Author

/ok to test

@mdemoret-nv
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit e13e345 into nv-morpheus:branch-24.10 Oct 18, 2024
12 checks passed
rapids-bot bot pushed a commit that referenced this pull request Oct 18, 2024
* Works-around the issue where CPU-only mode requires using the Python impl of `MessageMeta` a pandas DF, however the `LLMEngineStage` is implemented in C++ and only compatible with the C++ impl of `MessageMeta` with a cudf DF.
* Stores the Python impl of `MessageMeta` within the `ControlMessage` metadata which is able to store a Python object as-is.
* Updates the Simple Agents & Completion pipelines to optionally execute in CPU-only mode when the `--use_cpu_only` flag is given

Requires PR #1851 to be merged first

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https:/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https:/dagardner-nv)
  - Yuchen Zhang (https:/yczhang-nv)

Approvers:
  - Michael Demoret (https:/mdemoret-nv)

URL: #1906
rapids-bot bot pushed a commit that referenced this pull request Oct 18, 2024
* Documents writing a stage that supports CPU execution mode
* Updates `docs/source/developer_guide/contributing.md` cleaning up build and troubleshooting sections. 

Requires PRs #1851 & #1906 to be merged first

Closes [#1737](#1737)

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https:/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https:/dagardner-nv)
  - Yuchen Zhang (https:/yczhang-nv)

Approvers:
  - Michael Demoret (https:/mdemoret-nv)

URL: #1924
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment